time-rs / time

The most used Rust library for date and time handling.
https://time-rs.github.io
Apache License 2.0
1.13k stars 281 forks source link

add support for bincode v2 #561

Closed irevoire closed 1 year ago

irevoire commented 1 year ago

Hey! I don’t know if this is the kind of PR you would be willing to merge, but I would really like to use bincode in our workflow, and the only blocking crate is time, so I was wondering if it would be possible to add the support of bincode under a feature flag.

codecov[bot] commented 1 year ago

Codecov Report

Merging #561 (0d0b7e5) into main (aab760f) will decrease coverage by 0.0%. The diff coverage is n/a.

@@           Coverage Diff           @@
##            main    #561     +/-   ##
=======================================
- Coverage   95.7%   95.7%   -0.0%     
=======================================
  Files         78      78             
  Lines       8703    8702      -1     
=======================================
- Hits        8330    8329      -1     
  Misses       373     373             
Impacted Files Coverage Δ
time/src/date.rs 100.0% <ø> (ø)
time/src/date_time.rs 94.6% <ø> (ø)
time/src/offset_date_time.rs 100.0% <ø> (ø)
time/src/time.rs 100.0% <ø> (ø)

... and 1 file with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

jhpratt commented 1 year ago

What advantage does this give over using the native support for any type implementing Serialize/Deserialize? Also, does the derive not implicitly guarantee a stable ABI on my end?

irevoire commented 1 year ago

Hey,

What advantage does this give over using the native support for any type implementing Serialize/Deserialize?

So here are the two points of why I want to use bincode;

Also, does the derive not implicitly guarantee a stable ABI on my end?

That’s something else I wanted to talk about with you before merging anything. I think what I did is the worst way to implement Encode/Decode. tbh I did it because I needed it fast, but yes, it forces you to never change your structures (at least in a patch version), + it’s not optimal at all, I think.

I don’t know much about the crate and dates in general, so let me know if I’m wrong, but I think we would be better off implementing our own Encode/Decode. Do you think storing a single unix_timestamp() of the OffsetDateTime would be enough? Should we also encode the offset?

jhpratt commented 1 year ago

bincode v2 is generally faster than bincode+serde (with lto enabled)

Given that the linked issue is labelled as a bug, it seems reasonable to discount this as temporary.

A lot of attributes you can use in serde breaks bincode+serde, while it always works with Encode/Decode

Is this a technical restriction or merely unimplemented? If the latter, it's something that should be fixed upstream.

With regard to deriving versus a manual implementation, a manual implementation would be preferred for the same reason it's preferred for serde.

Upon a second glance at the diff, I noticed two things that I didn't initially:

Due to the uncertainties involved in if/when a full release will be created, in addition to the reasonable belief that the different serde attributes could be handled upstream, I'm going to go ahead and close this PR.

irevoire commented 1 year ago

Is this a technical restriction or merely unimplemented? If the latter, it's something that should be fixed upstream.

That's a technical restriction, it can’t be implemented, and that’s the reason for all this bincodev2 thingy

jhpratt commented 1 year ago

I'd likely need more of an explanation of the actual restriction if this were to be merged. However, the release candidate for ~18 months is more than reason to not merge this, at least for the time being.

swwu commented 1 year ago

Hey @jhpratt ! I wanted to follow up with some specifics about the limitations of bincode+serde, just for clarification's sake (although I agree that it doesn't make sense to merge this until bincode2 is out of RC).

The main restrictions are around how bincode isn't a self-describing format, i.e. bincode can't serialize information about its own format. This means that it can't properly implement the Deserializer::deserialize_any method in serde. There's a list here of all the things in serde that, for this reason, can't be implemented in bincode, but the short version is that anything that changes the structure of the produced serialization (skipping, flattening, enum tagging, etc) can't be done via bincode.

This is obviously annoying when working with bincode alongside, say, serde_json, since serde's default internal tagging for enums is fairly annoying to handle in a typescript/javascript frontend; this often means that you have to maintain different concrete structs (or manually implement a custom serializer/deserializer) if you want both ergonomic JSON output and bincode on a particular struct. Bincode2 gets around this by implementing its own, non-serde-based Encode and Decode traits, that lets you use tagging/flattening/skipping to your heart's content for JSON, while still preserving bincode serializability.

Hope that helps!

jhpratt commented 1 year ago

The only time deserialize_any is used in time is when implementing the visitor for Option<T>. It would be preferable if serde could handle container types (like Option and Vec natively, but that's not currently the case. Aside from that one instance, I don't see any reason it shouldn't work with bincode. None of the attributes mentioned on the linked page are used either.

swwu commented 1 year ago

I think it would be less using time directly, but having a data structure that included one. For instance, you might have some type that looks like this:

#[derive(Serialize, Deserialize)]
#[serde(untagged)]
pub enum TimeOrU64 {
  Time(Time),
  U64(u64)
}

which would serialize+deserialize fine in serde_json, but not in bincode+serde, even if time itself doesn't run afoul of deserialize_any, because the untagged forces use of deserialize_any for the whole enum, which causes bincode to fail on deserialization.

In this case, it would be nice to do something like this instead:

#[derive(Serialize, Deserialize, bincode2::Encode, bincode2::Decode)]
#[serde(untagged)]
pub enum TimeOrU64 {
  Time(Time),
  U64(u64)
}

Which would allow us to serialize/deserialize this struct using bincode as well as serde_json via separate implementations. But, of course, this only works if Time: bincode2::Encode + bincode2::Decode.

jhpratt commented 1 year ago

That doesn't strike me as a particularly strong use case. As time works as expected on its own, I don't see the need to provide dedicated trait implementations. bincode could provide an implementation on the types directly or any crate can do the same via a wrapper struct.