Open theJasonFan opened 2 years ago
I'm not sure about this.
My own needs for serialization are roughly these:
As far as I understand, Serde follows a different philosophy. I've had bad experiences with adding features I don't use myself. In the long term, such features tend to stop working properly, especially if they involve conditional compilation. At the same time, they make the code more difficult to change and maintain.
Can Serde support be added without too much maintenance burden?
Thank you for the quick response. A couple thoughts:
Can Serde support be added without too much maintenance burden?
AFAIK, serde compatibility can be added to RawVector
, BitVector
, and IntVector
with simple #[derive(Serialize, Deserialize)]
. annotations where the structs are defined. The code added would just be annotations, and the actual serialization formats / protocols are offloaded to the serde
ecosystem. I can take a closer look to see if there may be pain-points with rank/select support.
Fast and space-efficient serialization for multi-gigabyte structures
This would have to be benchmarked, but we have been using serde
+ bincode
to serialize and deserialize multi-gigabye sds vectors without issue -- the sds vectors are the same size serialized as is in memory.
Interoperability... file formats do not change... simple memory-mapped files...
Off the top of my head, I do not think serde compatibility vis-a-vis #[derive(Serialize, Deserialize)]
annotations would address these needs. However, my thought here is to add serde
compatibility alongside the serialization formats you have implemented.
My overall thought is that adding serde
compatibility alongside your serialization APIs makes development easy for downstream users like myself. If made compatible with serde
, adding serialization/deserialization functionality to any struct that contains a simpel_sds::{RawVector, BitVector, IntVector} would just be a one-line
#[derive(Serialize, Deserialize)]` annotation on struct definitions.
I was thinking about the maintenance burden if the in-memory data structures change. That would not be a breaking change from my point of view if the interfaces and the serialization formats remain the same. If you derive Serde serialization, any changes in in-memory structures would break compatibility with old files. I guess the question is how much effort it would take to implement serialization manually and always serialize the same data as when using my interface.
Ah yes, that would be quite non-trivial --- I understand your concern now.
One possible, but inelegant, solution, would be to implement "stable" structs that mirror "in-memory" structs that could change implement serde::Serialize
. The "stable" structs that are guaranteed (by convention) to not change, or change extremely infrequently. So maintenance involves maintainingFrom
trait implementations to map between the "stable" and "in-memory" structs.
Then again, serde::Serialize
only provides/exposes a data model and interface to which other serialization libraries can drive. These libraries in the serde ecosystem could very well introduce backwards-incompatible changes w.r.t how the compatible structs are serialized.
Hi @jltsiren,
First thank you for implementing
simple-sds
!Would you be willing to accept a PR to add compatibility for with
serde
? I understand that compatibility with data formats such asjson
provided byserde_json
would not make much sense; butbincode
offers quite compact serialized representations. My thought here would be to use the#[serde(with = ... )]
variant attributes and implement the appropriate interfaces to makesimple-sds
data structures work withderive
for any structs that contain them.I'm working on a project that uses
simple-sds
bit/int/raw vectors and have been using thewith
annotations +bincode
to serialize data-structures. The thought would also be, for rank/select supported bit vectors, to serialize the bits only and build rank/select support at deserialization time.Thank you again for your work.