naomijub / edn-rs

[DEPRECATED]: Crate to parse and emit EDN
https://crates.io/crates/edn-rs
MIT License
81 stars 14 forks source link

Handle escape characters in strings #75

Closed dimfeld closed 3 years ago

dimfeld commented 3 years ago

Parsing failed with any string containing a \"because it ended the string early.

naomijub commented 3 years ago

I will take a look at it. However, I am without a computer until February, so I might take a while. Maybe @otaviopace can be quicker.

Otavio, what do you think about adding itertiools?

dimfeld commented 3 years ago

No hurry, this is just for a personal project and I'm using my fork for now. Thanks!

naomijub commented 3 years ago

But thanks for the issue

dimfeld commented 3 years ago

Thanks for the feedback. Made all the above requested changes and got rid of itertools completely!

Benchmark results:

     Running target/release/deps/parse-fe183261c12e6e1f
Gnuplot not found, using plotters backend
parse                   time:   [4.9507 us 4.9619 us 4.9758 us]
                        change: [+0.0064% +0.2933% +0.5524%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

     Running target/release/deps/serialize-704ae7e803b9c554
Gnuplot not found, using plotters backend
serde                   time:   [679.15 ns 681.93 ns 684.63 ns]
                        change: [+0.7656% +1.2731% +1.7743%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) low mild

edn                     time:   [2.6354 us 2.6375 us 2.6397 us]
                        change: [+1.3466% +1.6104% +1.8172%] (p = 0.00 < 0.05)
                        Performance has regressed.

     Running target/release/deps/tagged_parse-7435533901f55659
Gnuplot not found, using plotters backend
tagged_parse            time:   [1.3659 us 1.3705 us 1.3759 us]
                        change: [+12.912% +13.537% +14.131%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe
naomijub commented 3 years ago

So two issues.

  1. I asked my friend to run this test on the branch and the assert fails: [Update: he misread the test output, but it is a good test to have]

    #[test]
    fn parse_map_with_special_char_str1() {
        let mut edn = "{ :a \"hello\n \r \t \\\"world\\\" with escaped \\ characters\" }".chars().enumerate();
    
        assert_eq!(
            parse(edn.next(), &mut edn).unwrap(),
            Edn::Map(Map::new(
                map! {":a".to_string() => Edn::Str("hello\n \r \t \"world\" with escaped \\ characters".to_string())}
            ))
        );
    }
  2. Itertools solution seems to be way more performatic for parsing, from 33% up to 63%. The previous solution if you want to use/test it:

.fold_while((false, String::new()), |(last_was_escape, mut s), (_, c)| {
        if last_was_escape {
            // Supported escape characters, per https://github.com/edn-format/edn#strings
            match c {
                't' => s.push('\t'),
                'r' => s.push('\r'),
                'n' => s.push('\n'),
                '\\' => s.push('\\'),
                '\"' => s.push('\"'),
                _ => {
                    // On an unsupported escape character, output both the backslash and the character
                    s.push('\\');
                    s.push(c);
                }
            };

            FoldWhile::Continue((false, s))
        } else if c == '\"' {
            // Unescaped quote means we're done
            FoldWhile::Done((false, s))
        } else if c == '\\' {
            FoldWhile::Continue((true, s))
        } else {
            s.push(c);
            FoldWhile::Continue((false, s))
        }
    })
    .into_inner()
    .1;

With this I am DONE

naomijub commented 3 years ago

merged via https://github.com/naomijub/edn-rs/pull/77

dimfeld commented 3 years ago

Hmm, interesting. I actually see slower performance with the itertools solution, especially on tagged_parse which is almost completely string parsing. Anyway, thanks for your help in getting this merged!

naomijub commented 3 years ago

I will take a deeper look at it. tagged_parse didn't show much difference here

naomijub commented 3 years ago

Crate published if you want to use it from crates io