iddm / serde-aux

An auxiliary serde library providing helpful functions for serialisation and deserialisation for containers, struct fields and others.
MIT License
148 stars 25 forks source link

`StringOrVecToVec` with trailing separator. #26

Closed bjeanes closed 1 year ago

bjeanes commented 1 year ago

I have to parse a string of hex-encoded, space-separated bytes in a string but the source API includes a trailing space: "a1 b2 c3 d4 ".

If using a parser to turn the items into something other than a string type, such as an int, the parsing will fail due to the empty item.

Here is a test which demonstrates the problem:

#[test]
fn test_bytes_from_string() {
    fn bytes_from_string<'de, D>(deserializer: D) -> std::result::Result<Vec<u8>, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        StringOrVecToVec::new(' ', |s| {
            println!("{:?}", &s);
            u8::from_str_radix(s, 16)
        })
        .into_deserializer()(deserializer)
    }

    #[derive(serde::Deserialize, Debug)]
    struct MyStruct {
        #[serde(deserialize_with = "bytes_from_string")]
        list: Vec<u8>,
    }

    // Works
    let s = r#" { "list": "a1 b2 c3 d4" } "#;
    let a: MyStruct = serde_json::from_str(s).unwrap();
    assert_eq!(&a.list, &[0xa1, 0xb2, 0xc3, 0xd4]);

    // Fails:
    //   thread 'test_bytes_from_string' panicked at 'called `Result::unwrap()` on an `Err` value: Error("cannot parse integer from empty string", line: 1, column: 27)'
    let s = r#" { "list": "a1 b2 c3 d4 " } "#;
    let a: MyStruct = serde_json::from_str(s).unwrap();
}

Is there a way to resolve this or is this a bug with trailing separators?

iddm commented 1 year ago

Hi @bjeanes and, first of all, thanks for raising an issue!

This behaviour comes from the default behaviour of str::split:

fn main() {
    let s = "1-2-3---4";
    for ss in s.split('-') {
        println!("Split: {}", ss);
    }
}

that leads to

Split: 1
Split: 2
Split: 3
Split: 
Split: 
Split: 4

To prevent this from happening, additional work must be done on top of that to ensure there are no elements which are, again, the same as the separator as in this case. As this is the default behaviour for a string split in Rust currently, I think that adding such checks to prevent this from happening to StringOrVecToVec would cause a misunderstanding in other cases as this might not be expected, perhaps. However, I also see a point in actually doing so. I think we could add another Pattern variant which could also support removing all the separators from appearing when (before) the converter function is invoked so that it would pass in your case. Would this work for you?

bjeanes commented 1 year ago

I think we could add another Pattern variant which could also support removing all the separators from appearing when (before) the converter function is invoked so that it would pass in your case. Would this work for you?

I think there are a few ways to do it: splitting on a regex, pattern to pre-process the string before splitting, or (prob my favourite) accept a fn to use in filter_map with a default of |x| Some(x).

In the end, I worked around this by splitting as string then post-processing myself: https://github.com/bjeanes/modbus-mqtt/blob/83482329b55b3e8d03575a23f62e87c4e47cc2cc/sungrow-winets/src/lib.rs#L520-L549. Fortunately, I could take advantage of the fact that I know there will always be an even number of items and I need to consume them in pairs anyway, so chunks_exact naturally drops the extra item.

But I do think this is worth resolving for future users, it's just not as urgent.

I do think the filter_map route is pretty elegant and versatile.

bjeanes commented 1 year ago

I just realised using filter_map instead of map in https://github.com/vityafx/serde-aux/blob/f064e78b81fac9e6abdc331604d23a81a8f7c901/src/field_attributes.rs#L1094 would avoid an extra field/parameter. But it would be a breaking change unless some fancy generics jumping could be done to accept either and do the right thing based on whether the fn returns Result<T, E> or Option<Result<T, E>>.

iddm commented 1 year ago

I've pushed some changes to address this issue in a simple and backwards-compatible manner: https://github.com/vityafx/serde-aux/commit/33ddb1e747800b749ee51e75fbe1e3eada90ba75

Would you like to try it out?

bjeanes commented 1 year ago

Hi @vityafx that definitely worked, though I wouldn't call it backwards compatible, as you've addded an argument to new(), which is a public fn:

Screen Shot 2022-09-06 at 8 15 22 am

It seems you've already released it, but it would make sense to have new() still only take 2 arguments, defaulting the skip_empty to false and requiring users like myself to call new(..).skip_empty().

Either way, thank you for the fix 🙏🏻

iddm commented 1 year ago

Hi @vityafx that definitely worked, though I wouldn't call it backwards compatible, as you've addded an argument to new(), which is a public fn:

Screen Shot 2022-09-06 at 8 15 22 am

It seems you've already released it, but it would make sense to have new() still only take 2 arguments, defaulting the skip_empty to false and requiring users like myself to call new(..).skip_empty().

Either way, thank you for the fix 🙏🏻

Yes, that's my fault, I forgot about the change in "new". I have already bumped the major version as well, so yes, it is not backwards compatible indeed. I will think about the interface as you suggested, thanks!

bjeanes commented 1 year ago

@vityafx otherwise, if you're going to make a major version jump, consider just changing the filter to return Option<Result<T, E>>, though I do think there is probably a way to do that in a backwards compatible way. I might have a go at prototyping it actually... 🤔