time-rs / time

The most used Rust library for date and time handling.
https://time-rs.github.io
Apache License 2.0
1.12k stars 280 forks source link

Deserializing Weekday from u8 returns an error #505

Closed Aegonek closed 2 years ago

Aegonek commented 2 years ago

Trying to deserialize a Weekday from a number (without enabling serde-human-readable feature) returns an error Error("invalid type: integer '1', expected a 'Weekday'", ...)

I've created a minimal binary crate for testing, using stable channel. Dependency section:

[dependencies]
serde = "1.0.144"
time = { version = "0.3.14", features = [ "serde" ] }
serde_json = "1.0.85"

Code to reproduce the issue:

use std::error::Error;
use time::Weekday;

fn main() -> Result<(), Box<dyn Error>> {
    let day = Weekday::Monday;
    let str = serde_json::to_string(&day)?;
    let day: Weekday = serde_json::from_str(&str)?;
    println!("Day is {day}");
    Ok(())
}

// Result: Error: Error("invalid type: integer '1', expected a 'Weekday'", line: 1, column: 1)

I did some debugging and it seems that this code invokes deserializer.deserialize_u8(Visitor::<Self>(PhantomData)) as expected, but deserialization still fails. I don't know Serde internals too well, so I don't know why. I could look for a solution later this week. You can assign me to this issue, please.

legas7 commented 2 years ago

Serde deserializer by default forwards visit_u8/16/32 to visit_u64. So weekday's (and month's) visitors visit_u8 never gets called. One workaround could be changing visitor type to u64, but I'm wondering if there is more elegant way to override default forward, since we still serialized u8 value.

Here you can lookup what Deserialize and Visitor traits are about: https://serde.rs/impl-deserialize.html

Aegonek commented 2 years ago

Hey, I had the issue figured out last week but no time to write. Using the example of Weekday to explain, but the implementation is faulty for other types too. Example code at the end. Marked interesting places with ^1 and ^2.

Issues:

  1. ^1 - deserialize_u8 is only a hint about a type for a deserializer. Deserializer is free to ignore the hint and deserialize value as different type. JSON deserializer specifically decodes all numbers as i64.

Proposed solution: none, none-issue, just don't rely on this behavior.

  1. ^2 - visit_u8 is only invoked if deserializer handles u8 specifically, which is kinda error-prone. Default behaviour for visit_num methods is:
    1. [x8, x16, x32] -> fallthrough to x64 if not implemented
    2. [x64] -> fail with type error

Proposed solution: remove visit_u8 implementation, add visit_i64 and visit_u64 implementation instead for all types that implement Deserialize.

  1. This issue passed the test because serde_test only checks if we can get, for example, Monday for Token::U8(1). This fails because we may not necessarily have Token::U8 after deserializing for reasons above.

Proposed solution: add a test for every type that serializes and deserializes value from JSON string. Something like:

#[test]
fn weekday_json() -> Result<(), Box<dyn error::Error>> {
    use Weekday::*;

    let mut buf: Vec<u8> = Vec::new();
    let mut ser = serde_json::Serializer::new(&mut buf);
    let ser = (&mut ser).compact();
    Monday.serialize(ser)?;
    assert_eq!(String::from_utf8(buf)?, "1");

    let mut de = serde_json::Deserializer::from_str("1");
    let de = (&mut de).compact();
    let monday = Weekday::deserialize(de)?;
    assert_eq!(monday, Monday);

    Ok(())
}

I could probably generate tests like this by macro for less typing.

  1. Currently regardless of chosen serde format we can deserialize value from string and not from number. After fixing the issue with deserialization we will be able to deserialize value from both string and a number regardless of chosen serde format. I think the expected behaviour is that we can, for example, deserialize Monday only from 1 for features = ["serde"] and only from "Monday" for features = ["serde-well-known"]? This will, however break user code if someone was relying on deserializing Weekday from string when using features = ["serde"]. I think that it's quite easy error to make.

Proposed solution: allow deserializing from both formats on both configurations. I don't think it hurts anyone.

@jhpratt - let me know what you think about it, I think I'll do some other stuff when waiting for answer.

impl<'a> Deserialize<'a> for Weekday {
  fn deserialize<D: Deserializer<'a>>(deserializer: D) -> Result<Self, D::Error> {
      if cfg!(feature = "serde-human-readable") && deserializer.is_human_readable() {
          deserializer.deserialize_any(Visitor::<Self>(PhantomData))
      } else {
          // ^1
          deserializer.deserialize_u8(Visitor::<Self>(PhantomData))
      }
  }

impl<'a> de::Visitor<'a> for Visitor<Weekday> {
    type Value = Weekday;

    fn expecting(&self, formatter: &mut fmt::Formatter<'_>) -> fmt::Result {
        formatter.write_str("a `Weekday`")
    }

    fn visit_str<E: de::Error>(self, value: &str) -> Result<Weekday, E> {
        match value {
            "Monday" => Ok(Weekday::Monday),
            "Tuesday" => Ok(Weekday::Tuesday),
            "Wednesday" => Ok(Weekday::Wednesday),
            "Thursday" => Ok(Weekday::Thursday),
            "Friday" => Ok(Weekday::Friday),
            "Saturday" => Ok(Weekday::Saturday),
            "Sunday" => Ok(Weekday::Sunday),
            _ => Err(E::invalid_value(de::Unexpected::Str(value), &"a `Weekday`")),
        }
    }

  // ^2
    fn visit_u8<E: de::Error>(self, value: u8) -> Result<Weekday, E> {
        match value {
            1 => Ok(Weekday::Monday),
            2 => Ok(Weekday::Tuesday),
            3 => Ok(Weekday::Wednesday),
            4 => Ok(Weekday::Thursday),
            5 => Ok(Weekday::Friday),
            6 => Ok(Weekday::Saturday),
            7 => Ok(Weekday::Sunday),
            _ => Err(E::invalid_value(
                de::Unexpected::Unsigned(value.into()),
                &"a value in the range 1..=7",
            )),
        }
    }
}
jhpratt commented 2 years ago

Proposed solution: remove visit_u8 implementation, add visit_i64 and visit_u64 implementation instead for all types that implement Deserialize.

:+1:

add a test for every type that serializes and deserializes value from JSON string

:+1: — definitely use a macro to make things simpler.

After fixing the issue with deserialization we will be able to deserialize value from both string and a number regardless of chosen serde format

If you can get this to work, that would be wonderful. I've tried this before for various types to no avail.

Aegonek commented 2 years ago

I opened a pull request for my changes: https://github.com/time-rs/time/pull/506/files. I think all we need to be able to deserialize value regardless of chosen serde format is to use deserialize_any method regardless of feature flag - but it would probably have some performance penalty. For now, you can deserialize value from any supported format on serde-well-known and from compact format only on serde - I think that's reasonable.

jhpratt commented 2 years ago

I think all we need to be able to deserialize value regardless of chosen serde format is to use deserialize_any method regardless of feature flag

This only works for self-describing formats.