Open arteymix opened 7 years ago
I only understand this partially. But I think
It is hard to define get_pointer on a FileArray. My preference on dealing
with Files is to have another interface
RecordFile.get_structured_array(start, end) which returns a new copy of
Array
It would help if you can write down some async examples with Array
On Thu, Dec 8, 2016 at 1:40 PM, Guillaume Poirier-Morency < notifications@github.com> wrote:
First, we keep focusing on having an efficient in-memory implementation. When the API will be stable and final, I think we should have the following abstraction:
- Array
as an interface defining primitives (e.g. get_pointer, set_from_pointer, ...) T hold for a ref-counted object that holds the actual data - Array.Iterator as an interface so that concrete implementation can do specific optimizations
- Array.Builder as-is, it should work on any kind of array
Then for implementations:
- BytesArray which implements Array
- FileArray, an abstract class to implement view HDF5 and other data format (implements Array
) - NetworkArray or anything that would represent remote array in a distributed context
For FileArray and NetworkArray, it would be really convenient to have _async helpers and delay execution in batches:
yield network_array.fill_from_value (1).store_async ();
One typical use case would be to have a BytesArray locally, partition it and sync it remotely.
for (var i = 0; i < arr.shape[0]; i++) { yield arr_network_partition[i].fill_from_array (arr.index (0)).store_async (); }
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rainwoodman/vast/issues/39, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIbTGoc1wfLYZCGpGDg5njAyf35dvSIks5rGHlMgaJpZM4LIUPl .
Honoring Array
interface does not imply that FileArray
would not have additional routines more adapted to its particular situation (e.g. ability to seek and buffer the file). I think that whatever the backend is, it must provide all of what we have defined already.
I think that universal function should only consider pointers, even if that imply that we have to unpack data from the array into a temporary buffer (which would be needed anyway to apply the actual routine). This way we would only have one set of routines, but optimized iterators for each array backend.
We'll see in the process.
This is different than parser/formatter though as they are more designed to fill and render any given array.
Also I think that interfaces are zero-cost since they only expect a set of symbols to be defined, but I'll verify that. It would not be nice to have virtual call overhead.
For async, I'll try to write some examples. It does not make sense for memory-based array.
There's a lot of stride/shape logic in Array
construction and I doubt this will fit an interface model. I think it's okay to either use an abstract class or a concrete with virtual methods.
First, we keep focusing on having an efficient in-memory implementation. When the API will be stable and final, I think we should have the following abstraction:
Array<T>
as an interface defining primitives (e.g.get_pointer
,set_from_pointer
, ...)T
hold for a ref-counted object that holds the actual dataArray.Iterator
as an interface so that concrete implementation can do specific optimizationsArray.Builder
as-is, it should work on any kind of arrayThen for implementations:
BytesArray
which implementsArray<Bytes>
FileArray
, an abstract class to implement view HDF5 and other data format (implementsArray<File>
)NetworkArray
or anything that would represent remote array in a distributed contextFor
FileArray
andNetworkArray
, it would be really convenient to have_async
helpers and delay execution in batches:One typical use case would be to have a
BytesArray
locally, partition it and sync it remotely.