Open BoPeng opened 6 years ago
Motivations:
Efficient data exchange is critical for multi-language data analysis. "Data" here generally means "variables" in scripting languages, not "datasets" in particular format.
"Normal" scripting can benefit from a universal interface.
Workflows and other multi-language environments can benefit more from it.
Existing common-format mechanism is limited by heterogeneity of interface, types of supported data, and efficiency in handling special/large data types.
The goal is to provide
The no-goal of this project
I think our life will be much easier if do not aim to provide a persistent file format. That is to say, the generated file is meant to be temporarily and is not guaranteed to be readable by future versions of data exchanger.
But there will always be interface to load it for all languages, right? I guess when the project gets mature and compatibility no longer an issue, then it's good enough to provide such interfaces.
I think our life will be much easier if do not aim to provide a persistent file format.
sounds like you've got concrete ideas of what is going to be the first version of storage format?
No, but I believe even the base format will be changed very frequently at least before the project matures, even then addition of one-to-one data exchange will be incompatible. Keep backward compatibility will be very costly and it might be easier to state from the beginning we are aiming at more or less interactive use.
Separate from vatlab/sos#952 , this ticket keeps discussion on some thoughts on SoS Data Exchanger. The general idea is to use a neutral interface for all languages, with implementation details broken down to various functions that can be added incrementally in time. This is different design from using some intermediate common file formats, say JSON or HDF5, where there will be loss of information and performance issues.