iddm / serde-aux

An auxiliary serde library providing helpful functions for serialisation and deserialisation for containers, struct fields and others.
MIT License
152 stars 26 forks source link

Add `deserialize_vec_from_string` function #15

Closed vbrandl closed 3 years ago

vbrandl commented 3 years ago

This function allows to deserialize a comma separated String into a Vec<T>.

std::str::pattern::Pattern is unstable but I was thinking about parameterizing over the split pattern as &str or a closure and return a Fn(D) -> Result<Vec<T>, D::Error>> but ran into lifetime and type errors...

This is what I tried:

pub fn deserialize_vec_from_string_param<'de, T, D>(
    f: impl Fn(char) -> bool,
) -> impl Fn(D) -> Result<Vec<T>, D::Error>
where
    D: serde::Deserializer<'de>,
    T: FromStr + serde::Deserialize<'de>,
    <T as FromStr>::Err: std::fmt::Display,
{
    move |deserializer: D| deserialize_vec_from_string2(f, deserializer)
}

fn deserialize_vec_from_string2<'de, T, D>(
    f: impl Fn(char) -> bool,
    deserializer: D,
) -> Result<Vec<T>, D::Error>
where
    D: serde::Deserializer<'de>,
    T: FromStr + serde::Deserialize<'de>,
    <T as FromStr>::Err: std::fmt::Display,
{
    #[derive(Deserialize)]
    #[serde(untagged)]
    enum StringOrVec<T> {
        String(String),
        Vec(Vec<T>),
    }

    match StringOrVec::<T>::deserialize(deserializer)? {
        StringOrVec::String(s) => s
            .split(f)
            .map(T::from_str)
            .collect::<Result<Vec<_>, _>>()
            .map_err(serde::de::Error::custom),
        StringOrVec::Vec(v) => Ok(v),
    }
}

But this does not work...

While this would go in the direction, I intended, it doesn't work as an annotation.

vbrandl commented 3 years ago

One more question about internals of the function: should the elements of the split be trimmed before calling from_str?

Currently 1, 2, 3 cannot be parsed, since the elements would be ["1", " 2", " 3"] (notice the leading spaces for 2 and 3). Calling trim() before from_str would fix this.

On the other hand trimming the string might lose significant whitespace...

~Also what's your opinion on being able to use a user defined separator other than ,?~ Edit: I was able to implement this

iddm commented 3 years ago

Thanks!

I think trim is something a user should decide whether to do or not, we shouldn't decide everything for the user. So I think not, it should be something we may provide a way for, but not do unconditionally. In the case of JSON, yes, the space between array elements doesn't change anything, - be it objects, or string literals or numbers, whitespace doesn't change the way they are parsed, but I am not sure about other data-interchange formats, so I don't think we should do it. If you meant the data inside an array element, then even more so, I think, we should never change the data without the user's awareness.

The problem you are describing with an example of numbers inside a string, yes, this won't work, but still, we let the user decide what to do next. Ideally, numbers should never be inside a string in the first place, secondly, we can't always know what the data there is and so we can't know how to deal with it, it is only the end-user who knows that.

vbrandl commented 3 years ago

You're right. Maybe something like this would help:

pub fn foo<'de, T, D, E>(
    sep: impl Fn(char) -> bool,
    parser: impl Fn(&str) -> Result<T, E>
) -> impl Fn(D) -> Result<Vec<T>, <D as serde::Deserializer<'de>>::Error>
where
    D: serde::Deserializer<'de>,
    T: serde::Deserialize<'de>,
    E: std::fmt::Display
{
  ...
}

This way, if T implements FromStr and that's the desired behavior, this can be called as foo(|c| c == ',', T::froim_str)

And if the caller wants the elements to be stripped first, it can be done like this: foo(|c| c == ',', |s| s.trim().parse::<T>())

Also this way, we can lose the T: FromStr restriction.