vidartf / ipydatawidgets

A set of widgets to help facilitate reuse of large datasets across widgets
Other
44 stars 9 forks source link

Binary serialization: specifying strides #13

Open SylvainCorlay opened 6 years ago

SylvainCorlay commented 6 years ago

Would it be possible to drop the ascontiguousarray in the to_json, and specify strides instead in the wire format?

In doing so, we should be able to prevent an extra copy in the case of non-contiguous arrays / strided views.

cc @maartenbreddels @wolfv @gouarin.

SylvainCorlay commented 6 years ago

I understand this would be a backward-incompatible change.

vidartf commented 6 years ago

I'm a little unsure what behavior you intend:

  1. Optimize serialization such that a copy is not made, but when putting on the wire it is streamed continuously.
  2. Putting it non-contiguously on the wire, and have the JS deal with non-contiguous data.
SylvainCorlay commented 6 years ago

I meant version 2.

vidartf commented 6 years ago

Then, as long as it is compatible with the API of the JS ndarray library I would be for it. I would consider making a flag for the wanted behavior though (I'll have to look at it in detail).

SylvainCorlay commented 6 years ago

I appears to be supporting strides in the constructors.

Note that strides in numpy are in number of bytes, instead of number of elements (unlike JS ndarray and xtensor).

SylvainCorlay commented 6 years ago

@vidartf instead of using flags, you can use a coercing custom validator.

http://traittypes.readthedocs.io/en/latest/usage.html#example-validating-the-shape-of-a-numpy-array

vidartf commented 6 years ago

PS: Any code on the JS side that uses the underlying typed array will likely need the contiguous array, so we should probably also implement a utility function on the JS side for getting a typed array of a certain order.

SylvainCorlay commented 6 years ago

Is this a feature of the JS ndarray library?

vidartf commented 6 years ago

No, but it should be a simple for loop for C-contiguous, I think. Things like threejs require the raw data, so I don't want to upset that too much.