rainwoodman / vast

vala and scientific numerical computation
11 stars 1 forks source link

Array interface and more! #39

Open arteymix opened 7 years ago

arteymix commented 7 years ago

First, we keep focusing on having an efficient in-memory implementation. When the API will be stable and final, I think we should have the following abstraction:

Then for implementations:

For FileArray and NetworkArray, it would be really convenient to have _async helpers and delay execution in batches:

yield network_array.fill_from_value (1).store_async ();

One typical use case would be to have a BytesArray locally, partition it and sync it remotely.

for (var i = 0; i < arr.shape[0]; i++) {
    yield arr_network_partition[i].fill_from_array (arr.index (0)).store_async ();
}
rainwoodman commented 7 years ago

I only understand this partially. But I think

On Thu, Dec 8, 2016 at 1:40 PM, Guillaume Poirier-Morency < notifications@github.com> wrote:

First, we keep focusing on having an efficient in-memory implementation. When the API will be stable and final, I think we should have the following abstraction:

  • Array as an interface defining primitives (e.g. get_pointer, set_from_pointer, ...) T hold for a ref-counted object that holds the actual data
  • Array.Iterator as an interface so that concrete implementation can do specific optimizations
  • Array.Builder as-is, it should work on any kind of array

Then for implementations:

  • BytesArray which implements Array
  • FileArray, an abstract class to implement view HDF5 and other data format (implements Array)
  • NetworkArray or anything that would represent remote array in a distributed context

For FileArray and NetworkArray, it would be really convenient to have _async helpers and delay execution in batches:

yield network_array.fill_from_value (1).store_async ();

One typical use case would be to have a BytesArray locally, partition it and sync it remotely.

for (var i = 0; i < arr.shape[0]; i++) { yield arr_network_partition[i].fill_from_array (arr.index (0)).store_async (); }

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rainwoodman/vast/issues/39, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIbTGoc1wfLYZCGpGDg5njAyf35dvSIks5rGHlMgaJpZM4LIUPl .

arteymix commented 7 years ago

Honoring Array interface does not imply that FileArray would not have additional routines more adapted to its particular situation (e.g. ability to seek and buffer the file). I think that whatever the backend is, it must provide all of what we have defined already.

I think that universal function should only consider pointers, even if that imply that we have to unpack data from the array into a temporary buffer (which would be needed anyway to apply the actual routine). This way we would only have one set of routines, but optimized iterators for each array backend.

We'll see in the process.

This is different than parser/formatter though as they are more designed to fill and render any given array.

Also I think that interfaces are zero-cost since they only expect a set of symbols to be defined, but I'll verify that. It would not be nice to have virtual call overhead.

arteymix commented 7 years ago

For async, I'll try to write some examples. It does not make sense for memory-based array.

arteymix commented 7 years ago

There's a lot of stride/shape logic in Array construction and I doubt this will fit an interface model. I think it's okay to either use an abstract class or a concrete with virtual methods.