purescript-contrib / purescript-string-parsers

A parsing library specialized to handling strings
MIT License
44 stars 21 forks source link

Use slices instead of cursors #83

Closed chtenb closed 2 years ago

chtenb commented 2 years ago

Proof of concept to fix #77. This change would be breaking because it changes the underlying representation.

Benchmark before Benchmark after
StringParser.runParser parse23AnyCharPoints StringParser.runParser parse23AnyCharPoints
mean = 1.01 s mean = 16.14 ms
stddev = 51.48 ms stddev = 13.64 ms
min = 963.28 ms min = 8.55 ms
max = 1.16 s max = 53.29 ms
StringParser.runParser parse23AnyCharUnits StringParser.runParser parse23AnyCharUnits
mean = 8.53 ms mean = 8.73 ms
stddev = 2.77 ms stddev = 2.95 ms
min = 7.25 ms min = 7.24 ms
max = 38.72 ms max = 40.49 ms
StringParser.runParser parse23DigitPoints StringParser.runParser parse23DigitPoints
mean = 994.93 ms mean = 10.02 ms
stddev = 16.97 ms stddev = 1.61 ms
min = 974.33 ms min = 8.65 ms
max = 1.04 s max = 23.21 ms
StringParser.runParser parse23DigitUnits StringParser.runParser parse23DigitUnits
mean = 10.87 ms mean = 10.28 ms
stddev = 1.78 ms stddev = 1.56 ms
min = 9.51 ms min = 8.71 ms
max = 23.81 ms max = 22.51 ms
StringParser.runParser parse23StringPoints StringParser.runParser parse23StringPoints
mean = 1.06 s mean = 6.91 ms
stddev = 16.93 ms stddev = 954.29 μs
min = 1.01 s min = 5.80 ms
max = 1.09 s max = 15.04 ms
StringParser.runParser parse23StringUnits StringParser.runParser parse23StringUnits
mean = 4.52 ms mean = 4.40 ms
stddev = 958.15 μs stddev = 757.47 μs
min = 3.81 ms min = 3.61 ms
max = 13.09 ms max = 8.46 ms
StringParser.runParser parse23RegexPoints StringParser.runParser parse23RegexPoints
mean = 1.03 s mean = 11.77 ms
stddev = 44.87 ms stddev = 2.11 ms
min = 978.55 ms min = 9.76 ms
max = 1.15 s max = 28.49 ms
StringParser.runParser parse23RegexUnits StringParser.runParser parse23RegexUnits
mean = 5.90 ms mean = 5.88 ms
stddev = 900.33 μs stddev = 1.05 ms
min = 4.89 ms min = 4.93 ms
max = 12.99 ms max = 16.71 m
chtenb commented 2 years ago

We can close https://github.com/purescript/purescript-strings/issues/155 once this PR gets merged

jamesdbrock commented 2 years ago

Nice, I will look at this.

chtenb commented 2 years ago

Why is this a breaking change, besides changing the names of the record fields in PosString?

Because the Parser type has changed and it is part of the public API of this package. If people have written a parser in terms of this definition, their code will have to be adjusted. Or does that not qualify as a breaking change?

jamesdbrock commented 2 years ago

Because the Parser type has changed and it is part of the public API of this package.

Sure, that makes sense. But the only thing that has changed about the type of the Parser is the record field names in PosString, right?

If it were up to me, I think I would do the work you've already done, and then also

  1. add anyCodePoint :: Parser CodePoint
  2. Document that anyChar will not always succeed, like here https://pursuit.purescript.org/packages/purescript-parsing/8.2.0/docs/Text.Parsing.Parser.String#v:anyChar
chtenb commented 2 years ago

But the only thing that has changed about the type of the Parser is the record field names in PosString, right?

Yes

I'll make an issue for anyCodePoint