Closed poorlyknitsweater closed 1 year ago
I benchmarked this PR against master
to check how big the impact is. The links below lead to the same commit applied to master
and this PR respectively:
stack bench --benchmark-arguments '--regress allocated:iters --time-limit 60'
on https://github.com/poorlyknitsweater/xml/commit/70e6fc95e936315b6c5815ef5390ca05b29fe32b:
benchmarking without newline normalization
time 706.1 ms (667.7 ms .. 744.8 ms)
0.995 R² (0.990 R² .. 0.999 R²)
mean 753.7 ms (738.0 ms .. 771.9 ms)
std dev 31.50 ms (22.36 ms .. 45.97 ms)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters 7.485e9 (7.485e9 .. 7.485e9)
y 3281.455 (2458.453 .. 3944.828)
stack bench --benchmark-arguments '--regress allocated:iters --time-limit 60'
on https://github.com/poorlyknitsweater/xml/commit/8551275f8679fb86150405ee1726e500de8e9314:
benchmarking with newline normalization
time 823.4 ms (807.6 ms .. 832.0 ms)
0.999 R² (0.998 R² .. 1.000 R²)
mean 809.0 ms (791.0 ms .. 819.8 ms)
std dev 24.49 ms (15.30 ms .. 36.55 ms)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters 7.612e9 (7.612e9 .. 7.612e9)
y 3253.236 (2333.607 .. 3973.035)
So the PR does have a big performance impact. I'll try to compare this against replacing takeWhile1
by something else that normalizes the line endings on-the-fly.
EDIT: The test file has random content with a high density of \r
and \n
characters so it's not representative at all. However I think there's a lot of room for improvement.
Second round of benchmarks on top of the new implementation (which replaces the takeWhile1
call by some black magic). My machine was a bit busy before so I reran the master
benchmark:
stack bench --benchmark-arguments '--regress allocated:iters --time-limit 60'
on https://github.com/poorlyknitsweater/xml/commit/70e6fc95e936315b6c5815ef5390ca05b29fe32b:
benchmarking without newline normalization
time 601.0 ms (592.4 ms .. 607.2 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 593.6 ms (586.1 ms .. 598.5 ms)
std dev 11.12 ms (6.234 ms .. 17.75 ms)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters 7.485e9 (7.485e9 .. 7.485e9)
y 3231.385 (2455.370 .. 3880.165)
stack bench --benchmark-arguments '--regress allocated:iters --time-limit 60'
on https://github.com/poorlyknitsweater/xml/commit/b05ba779b6f9b60e788429d1feef2d665e128421:
benchmarking with newline normalization
time 660.7 ms (656.1 ms .. 666.4 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 649.0 ms (637.2 ms .. 654.3 ms)
std dev 14.35 ms (5.778 ms .. 25.00 ms)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters 7.932e9 (7.932e9 .. 7.932e9)
y 3223.385 (2138.726 .. 4186.241)
benchmarking with newline normalization, no carriage returns
time 621.2 ms (618.0 ms .. 624.8 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 618.0 ms (608.4 ms .. 621.0 ms)
std dev 8.746 ms (2.888 ms .. 15.82 ms)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters 7.645e9 (7.645e9 .. 7.645e9)
y 2757.538 (390.949 .. 5775.262)
Which is better but still a lot worse. The second benchmark here runs on the same file, without any \r
characters. In this case there is still noticeable overhead, but it's not quite as bad.
The last change seems to reduce the overhead to basically nothing when the file contains no carriage returns:
stack bench --benchmark-arguments '--regress allocated:iters --time-limit 60'
on https://github.com/poorlyknitsweater/xml/commit/79103ca80d6f9280138db01a98cc6164115b5b78:
benchmarking with newline normalization
time 647.0 ms (636.1 ms .. 654.1 ms)
1.000 R² (0.999 R² .. 1.000 R²)
mean 635.0 ms (625.5 ms .. 639.5 ms)
std dev 11.90 ms (5.691 ms .. 19.16 ms)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters 7.995e9 (7.995e9 .. 7.995e9)
y 3114.462 (1932.848 .. 4241.436)
benchmarking with newline normalization, no carriage returns
time 600.1 ms (597.7 ms .. 603.4 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 594.8 ms (587.1 ms .. 597.8 ms)
std dev 8.306 ms (3.766 ms .. 14.28 ms)
allocated: 1.000 R² (1.000 R² .. 1.000 R²)
iters 7.702e9 (7.702e9 .. 7.702e9)
y 2757.538 (363.194 .. 5618.688)
Would you be willing to merge at this point? If so I'd be happy to clean up the code a little and add some documentation, especially around valid '\r' == False
:)
Took me some time to wrap my head around the logic, but this seems correct to me. I'd be happy to merge this after the clean-up and documentation you mentioned, now that this complies with the "don't pay for what you don't need" principle :) .
I added two more unit tests and some comments. The structure isn't much different but I guess it's a little clearer. I also made this little diagram:
┌────────────────────────────────┐
│consume until invalid, \r or EOF│
└────────────────┬───────────────┘
│
┌────▼────┐ ┌────┐
┌────┬──────────►peek next├─────────────►done│
│ │ └────┬────┘ invalid/EOF └────┘
│ │ │
│ │\n \r│
│ │ │
│ ┌┴───────────────▼───────────────┐
│ │discard \r │
│ │consume until invalid, \r or EOF│
│ │inspect first consumed char │
│ └────────────────┬───────────────┘
│ │otherwise
│ ┌────▼──┐
└───────────────┤emit \n│
└───────┘
It's not in the code yet though.
Thanks for the additional comments and tests. I'm merging this as is, and will apply the suggested simplifications afterwards.
XML demands that
\r
and\r\n
are replaced by\n
by the XML processor Section 2.11. This PR adds this feature.This PR might break compatitbility for many users.