jonasbb / serde_with

This crate provides custom de/serialization helpers to use in combination with serde's `with`-annotation and with the improved `serde_as`-annotation.
https://docs.rs/serde_with
Apache License 2.0
667 stars 72 forks source link

Add newline separators #777

Closed jayvdb closed 2 months ago

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 67.13%. Comparing base (aaa0a29) to head (b1ee949). Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
serde_with/src/formats.rs 0.00% 4 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #777 +/- ## ========================================== - Coverage 67.32% 67.13% -0.20% ========================================== Files 40 40 Lines 2464 2468 +4 ========================================== - Hits 1659 1657 -2 - Misses 805 811 +6 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jayvdb commented 2 months ago

This is sufficient for my needs as I only want Unix style multi-line strings parsed, but it would be nice if a cross-platform "new line" separator could be created. I suspect this would need a new interface.

Unfortunately this doesnt work:


    #[serde_as]
    #[derive(Debug, Deserialize, PartialEq, Serialize)]
    #[serde(untagged)]
    enum NewlineSeparatedStringSet {
        Unix(#[serde_as(as = "StringWithSeparator::<UnixLineSeparator, String>")] BTreeSet<String>),
        Dos(#[serde_as(as = "StringWithSeparator::<DosLineSeparator, String>")] BTreeSet<String>),
    }

It builds, but then tests fail because serde can always construct the wrong member of the enum, so it does, like

  left: Dos({"bar", "foo"})
 right: Unix({"bar", "bar\r", "foo\r"})
jayvdb commented 2 months ago

ping @jonasbb

jonasbb commented 2 months ago

Unfortunately this doesnt work:

    #[serde_as]
    #[derive(Debug, Deserialize, PartialEq, Serialize)]
    #[serde(untagged)]
    enum NewlineSeparatedStringSet {
        Unix(#[serde_as(as = "StringWithSeparator::<UnixLineSeparator, String>")] BTreeSet<String>),
        Dos(#[serde_as(as = "StringWithSeparator::<DosLineSeparator, String>")] BTreeSet<String>),
    }

serde always processes untagged enums top down. Since the unix line ending is part of the dos line ending, this leads to the wrong matches that you reported. Switching the order, i.e., having dos first, should work. Assuming the individual strings do not contain any \r or \n.

This is sufficient for my needs as I only want Unix style multi-line strings parsed, but it would be nice if a cross-platform "new line" separator could be created. I suspect this would need a new interface.

Do you mean something that is always the native newline separator of the current platform? That should be doable. If you want something that can match any separator while splitting strings (i.e., automatically figuring out which separator is used) then this will not work with the current interface.

jonasbb commented 2 months ago

This looks nice, thank you. Newline separators seem a quite stable separator.