stchang / parsack

A basic Parsec-like monadic parser combinator library implementation in Racket.
MIT License
50 stars 10 forks source link

Use thread cells for global state #53

Closed zyrolasting closed 4 years ago

zyrolasting commented 4 years ago

Re: https://github.com/stchang/parsack/issues/50#issuecomment-590435790

Adding thread cells do have an impact on performance. These timings below are derived by running parse-markdown as bytecode on a single 564K Markdown file on my machine.

master:

real    (mean 1.485s) 0m1.473s 0m1.480s 0m1.498s 0m1.468s 0m1.507s
user    (mean 1.412s) 0m1.387s 0m1.416s 0m1.434s 0m1.392s 0m1.430s
sys     (mean 0.068s) 0m0.081s 0m0.059s 0m0.059s 0m0.070s 0m0.072s

thread-cells:

real    (mean 1.589s) 0m1.568s 0m1.596s 0m1.571s 0m1.604s 0m1.604s
user    (mean 1.509s) 0m1.490s 0m1.513s 0m1.499s 0m1.514s 0m1.531s
sys     (mean 0.074s) 0m0.073s 0m0.078s 0m0.067s 0m0.084s 0m0.068s

I'm currently unsure about how to write a reliable test that verifies that no two threads interfere with each other. I think what I might do is create a user state with a counter, and confirm that 2 threads started at different times each parse 1000 characters and end up on position 1001 or something.

Obviously do not approve until I get a test in, but I want to know if you'd want a global switch to let the user specify shared state. That might save the performance, but it makes the interface a little more awkward ("Oh, are you multithreaded? Configure the library first")

stchang commented 4 years ago

This performance hit seems reasonable. You might also want to run and time the markdown test suite, which includes some performance tests.

Thanks for doing this work!

zyrolasting commented 4 years ago

@stchang Happy to do it. Thanks for reviewing.

Alright, I added a test where the main thread interleaves stateful parsing with another thread. The main thread attempts to destroy the state via an implied user-state-reset!. The test fails if the state is indeed reset mid-parse.

As for Markdown performance tests, they pass and this is what I see. (+cc: @greghendershott)

Strict mode:
Using /home/sage/Code/markdown/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 87, 87, 90, 90, 91 (sorted)
Average: 89.0
Non-strict mode (with extensions):
Using /home/sage/Code/markdown/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 101, 101, 104, 105, 105 (sorted)
Average: 103.2
[...]
raco test -x -s slow-test .
raco test: (submod "./markdown/random-test.rkt" slow-test)
Trying 500 docs with 300 "tokens" each, want < 5 sec: 1 51 101 151 201 251 301 351 401 451 
stchang commented 4 years ago

As for Markdown performance tests, they pass and this is what I see

Do you have numbers from before this patch?

zyrolasting commented 4 years ago

Ah, sorry. I ran both again since I can't remember if I had bytecode available for every module. For good measure, here's my console session for each test pass.

cd parsack
git co $BRANCH
raco setup parsack markdown
cd -

cd markdown
make test-all

I omitted the output from markdown's test-slow target since it's always the same.

master

Strict mode:
Using /home/sage/Code/markdown/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 80, 80, 81, 82, 86 (sorted)
Average: 81.8
Non-strict mode (with extensions):
Using /home/sage/Code/markdown/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 92, 92, 93, 93, 93 (sorted)
Average: 92.6

thread-cell

Strict mode:
Using /home/sage/Code/markdown/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 87, 89, 91, 95, 99 (sorted)
Average: 92.2
Non-strict mode (with extensions):
Using /home/sage/Code/markdown/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 105, 106, 107, 107, 107 (sorted)
Average: 106.4
stchang commented 4 years ago

Thanks! I think that is very reasonable.

I'm ready to merge but I'd like to wait to see if @greghendershott and @mbutterick want to weigh in.

Summary: parsack (and thus markdown) is broken in some multi-threaded scenarios. This patch fixes it, at the cost of what looks like a ~10% slowdown in some benchmarks.

mbutterick commented 4 years ago

You mean, do I care if Markdown users must absorb the costs of their life choices? Hell no.

zyrolasting commented 4 years ago

@stchang Note that the times I posted are for 7.6 CS, not 3m. If you want, run the session again if you have a 3m installation handy--unlike me.

greghendershott commented 4 years ago

I agree with @mbutterick this seems like reasonable ~punishment~ performance for markdown users.

Seriously, as someone who often uses markdown for my blog (as well as org and scribble for other purposes) I think this is acceptable.

stchang commented 4 years ago

@zyrolasting Thanks again for investigating and fixing this!

zyrolasting commented 4 years ago

So long as @mbutterick is happy, I'm happy.