xnd-project / libxnd

Subsumed into xnd
https://xnd.io/
BSD 3-Clause "New" or "Revised" License
80 stars 12 forks source link

[Feature] XND as a "Serialization" and "Deserialization" library #15

Closed costrouc closed 5 years ago

costrouc commented 6 years ago

I see a lot of use cases for XND as a "serialization" and "deserialization" library for both networking and via files (due to the fact that it only requires a memory copy). It would be worth putting thought into how XND fits into this picture.

File Representation

For example feather provides a way of sharing data that is compatible with python and R. Apache arrow provides support for feather along with a few other formats. Numpy provides support for writing to a .npy file.

I believe that XND is special in that it provides a super-set of these features.

Networking for Distributed Applications

For this It looks like direct access to the buffers and documentation on using it would be beneficial.

In Scientific computing MPI is still a dominant technology for networking. Since XND does not require any special transformations RDMA should work well and we should have great performance with MPI. I believe that XND would fit extremely well in this space because provide a generalized container for transferring the data. ADIOS tries to address this issue with xml descriptions of their data. https://www.olcf.ornl.gov/center-projects/adios/.

In the general networking space there are many technologies that compete but none of them seem well positioned for large data. Protocol buffers explicitly state that they are for small messages < 10 mb (as do many others in this space).

skrah commented 6 years ago

Thank you for opening this topic. Yes, I think XND would be highly useful for this. Since the types can already be serialized, it is a matter of exposing and dumping the data pointer and serializing bitmaps.

The latter can also be optional at first, i.e. it is OK to raise NotImplemented if the type contains bitmaps.

costrouc commented 6 years ago

I agree that the first topic writing to a file is much more approachable and would make the second part easier once implemented. I am still learning the inner workings of XND (currently playing around with a python script and inspecting the data structure with gdb). I will look into this and approaches that could be taken.

skrah commented 5 years ago

Value and type are now serialized together in a single string:

>>> x = xnd([[1, 2], [3, 4]])
>>> s = x.serialize()
>>> s
b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x03\x01\x00\x00\x00\x00\x02\x00\x00\x00 \x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x03\x01\x00\x00\x00\x00\x01\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x1b\x01\x00\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00\x08\x00 \x00\x00\x00\x00\x00\x00\x00'
>>> x.deserialize(s)
xnd([[1, 2], [3, 4]], type='2 * 2 * int64')