jonasbb / serde_with

This crate provides custom de/serialization helpers to use in combination with serde's `with`-annotation and with the improved `serde_as`-annotation.
https://docs.rs/serde_with
Apache License 2.0
636 stars 67 forks source link

Incorrect nanosecond serialization for very large or small values #771

Open matt-phylum opened 1 month ago

matt-phylum commented 1 month ago

TimestampNanoSeconds has the following serialization behavior:

Not being able to serialize values outside the range 1677-09-21T00:12:43.145224193Z..2262-04-11T23:47:16.854775808Z is expected since it's serializing to an i64, but if the value is outside that range it should return an error, not panic or return incorrect values.

use chrono::{DateTime, Utc};
use serde::Serialize;
use serde_with::serde_as;

fn main() {
    #[serde_as]
    #[derive(Serialize)]
    struct S(#[serde_as(as = "serde_with::TimestampNanoSeconds")] DateTime<Utc>);

    fn try_serialize(v: DateTime<Utc>) {
        let s = std::panic::catch_unwind(|| {
            serde_json::to_string(&S(v)).unwrap_or_else(|e| format!("Serialization error: {e}"))
        })
        .unwrap_or_else(|p| {
            format!(
                "Panic: {}",
                p.downcast_ref::<&str>()
                    .map(|s| &**s)
                    .unwrap_or_else(|| p.downcast_ref::<String>().map(|s| &**s).unwrap())
            )
        });
        println!("{v}: {s}");
    }

    try_serialize(DateTime::<Utc>::MIN_UTC);
    try_serialize("1385-06-12T00:25:26.290448384Z".parse().unwrap());
    try_serialize("1385-06-12T00:25:26.290448385Z".parse().unwrap());
    try_serialize("1677-09-21T00:12:43.145224191Z".parse().unwrap());
    try_serialize("1677-09-21T00:12:43.145224192Z".parse().unwrap());
    try_serialize("1677-09-21T00:12:43.145224193Z".parse().unwrap());
    try_serialize("2262-04-11T23:47:16.854775807Z".parse().unwrap());
    try_serialize("2262-04-11T23:47:16.854775808Z".parse().unwrap());
    try_serialize("2554-07-21T23:34:33.709551615Z".parse().unwrap());
    try_serialize("2554-07-21T23:34:33.709551616Z".parse().unwrap());
    try_serialize(DateTime::<Utc>::MAX_UTC);
}
-262143-01-01 00:00:00 UTC: Panic: overflow when multiplying duration by scalar
1385-06-12 00:25:26.290448384 UTC: Panic: overflow when multiplying duration by scalar
1385-06-12 00:25:26.290448385 UTC: 1
1677-09-21 00:12:43.145224191 UTC: 9223372036854775807
1677-09-21 00:12:43.145224192 UTC: Panic: attempt to negate with overflow
1677-09-21 00:12:43.145224193 UTC: -9223372036854775807
2262-04-11 23:47:16.854775807 UTC: 9223372036854775807
2262-04-11 23:47:16.854775808 UTC: -9223372036854775808
2554-07-21 23:34:33.709551615 UTC: -1
2554-07-21 23:34:33.709551616 UTC: Panic: overflow when multiplying duration by scalar
+262142-12-31 23:59:59.999999999 UTC: Panic: overflow when multiplying duration by scalar

A similar problem exists with TimestampNanoSecondsWithFrac:

TimeStampNanoSecondsWithFrac should never fail because it is serializing to f64 and f64 can represent much larger values, at the expense of precision.

The other TimeStamp* serializers are not affected because the range of values that can be represented by DateTime is smaller than the range of values that can be represented by an i64.

DurationNanoSeconds and DurationNanoSecondsFrac also have a similar problem:

DurationNanoSeconds should return an error instead of panicking, and DurationNanoSecondsFrac should never fail.

DurationMicroSeconds and DurationMicroSecondsFrac fail the same way for 18446744073709.551616s... 18446744073709551.616s.. for DurationMilliSeconds and DurationMilliSecondsWithFrac. DurationSeconds and DurationSecondsWithFrac are unaffected.

matt-phylum commented 1 month ago

I was planning to open another issue for having a DefaultOnError that works for serialization, but SerializeAs doesn't have a way to distinguish between errors because the value is unrepresentable and errors because the underlying serializer reported an error. Maybe the fallible (without fraction) versions could to take a type parameter that defines whether an out of range value is skipped or None or 0.

jonasbb commented 1 month ago

Thank you for the report. Yes, there is indeed a problem here. The code uses the normal math operations and some as casts. Fixing all of the instances will be a bit difficult, since the clippy lints for that will lead to many warnings, but not all of them actually being problematic. Many of the values will not be supported since they cannot be represented internally.

How did you come up with those time values to test? I wonder if you have a list of such strings/timestamps or if you used automated tools.

I do want to fix the issue, but it might take a while. It seems to require some steps

matt-phylum commented 1 month ago

I used binary search to find these edges between behaviors. I don't know if there's a list of such timestamps, but 9,223,372,036,854,775,807 is the limit of an i64 so those are probably good values to check for any implementation.