stchang / parsack

A basic Parsec-like monadic parser combinator library implementation in Racket.
MIT License
50 stars 10 forks source link

remove `set!` #50

Closed stchang closed 4 years ago

stchang commented 8 years ago

I've received a request to remove the use of set!. Essentially, this would restore the parameters I previously removed, e.g. for user-state. The parameters were removed for performance reasons but were ultimately not the bottleneck so restoring them should be the right thing to do. Opening this ticket while I do some experimenting.

stchang commented 8 years ago

Restoring parameters leads to a ~2x slowdown, e.g. here is the performance test for markdown

before:

Strict mode:
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 94, 95, 95, 96, 108 (sorted)
Average: 97.6
Non-strict mode (with extensions):
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 107, 107, 111, 112, 115 (sorted)
Average: 110.4

after:

Strict mode:
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 182, 182, 183, 183, 195 (sorted)
Average: 185.0
--------------------
FAILURE
name:       check-true
location:   (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 33 4 962 24)
expression: (check-true (< avg 170))
params:     (#f)

Check failure
--------------------
Non-strict mode (with extensions):
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 207, 207, 208, 208, 210 (sorted)
Average: 208.0
--------------------
FAILURE
name:       check-true
location:   (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 33 4 962 24)
expression: (check-true (< avg 170))
params:     (#f)

Check failure
--------------------
--------------------
FAILURE
name:       check-true
location:   (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 34 4 991 26)
expression: (check-true (< worst 200))
params:     (#f)

Check failure
--------------------

This is probably too much so I'm inclined to leave things the way they are. @greghendershott do you agree?

greghendershott commented 8 years ago

I tend to agree, especially not knowing the motivation for avoiding set!.

stchang commented 8 years ago

not knowing the motivation for avoiding set!

I believe it was to make it easier to add types, something which is on my todo as well.

zyrolasting commented 4 years ago

@stchang Hello from 2020. I actually ended up chatting with @greghendershott about this but on a different topic. I have a concurrent build where a few threads each parse a Markdown file. There's an intermittent issue where the build will fail with this error:

...
string-length: contract violation
  expected: string?
  given: #f
  context...:
   condition->exn
   do-raise
   dynamic-wind
   .../parsack/parsack.rkt:188:2
   [repeats 5 more times]
   .../parsack/parsack.rkt:265:2
   .../parsack/parsack.rkt:214:2
   [repeats 16 more times]
   .../parsack/parsack.rkt:373:2
   .../parsack/parsack.rkt:188:2
   .../parsack/parsack.rkt:214:2
   .../parsack/parsack.rkt:188:2
   .../match/compiler.rkt:507:40: f97
   .../parsack/parsack.rkt:214:2
   .../parsack/parsack.rkt:464:0: parse
   [repeats 1 more time]

Is it possible that the set!s not being synchronized would cause this issue? If so, would that justify restoring parameters, or at least use of thread cells?

stchang commented 4 years ago

Hi @zyrolasting I think it's very possible. I haven't been mindful of thread safety.

Do you have a concrete example I could look at though?

Since many other packages (markdown, frog, pollen) use this package in its current state, I'm a little wary of making invasive changes without a concrete use case.

zyrolasting commented 4 years ago

I have two.

First, I invited you to collaborate on a private repo where the trace came from. Just clone, raco pkg install --link, then run racket build/main.rkt until you see the error.

Second, there's a more minimal, yet less reliable reproduction here: https://github.com/zyrolasting/parsack-threading-break

The reason I invited you to the private repo is because the same error seems to occur there more often. I think that's because each thread is doing a lot more, and the scheduler has more cause to have one thread that's parsing yield to another thread that's parsing.

The timing issue is subtle, so my only instruction to reproduce the issue so far is "keep trying" while I narrow down a way to guarantee the exception.

stchang commented 4 years ago

I installed sgcom and can confirm an intermittent error.

unlike-assets: UPDATE: polyglot.rss.xml
error writing to stream port
  system error: Broken pipe; errno=32
  context...:
   /home/stchang/plt-stchang/racket/collects/racket/private/port.rkt:126:2: loop
   /home/stchang/sgcom/build/highlight-code.rkt:35:0: highlight-code
   /home/stchang/sgcom/build/highlight-code.rkt:48:4: for-loop
   /home/stchang/sgcom/build/highlight-code.rkt:47:2: recur
   /home/stchang/sgcom/build/html.rkt:17:0: make-dependent-html-page
   .../more-scheme.rkt:261:28
   /home/stchang/plt-stchang/racket/share/pkgs/kinda-ferpy/main.rkt:70:9

I'll investigate but probably not for a few weeks.

If you're interested in experimenting, you can first try changing these to racket parameters.

zyrolasting commented 4 years ago

That's alright, please take your time. In the worst case scenario I'll lean on a fork. My experimentation leans on plain thread cells instead of parameters. With luck that won't add much overhead.

Re: that error you saw, I am unsure if it's relevant. That would depend on if the synchronization problem occurs with ports, or the global state for expected characters. It may overlap with this thread.

I did manage to reproduce the parsack exn that concerned me using the minimal repro by having it parse 26 ~500KiB Markdown files. I used Racket 7.6, CS. Then I use raco make main.rkt and racket main.rkt.

[sage@localhost parsack-threading-break]$ racket main.rkt 
string-length: contract violation
  expected: string?
  given: #f
  context...:
   condition->exn
   do-raise
   dynamic-wind
   .../parsack/parsack.rkt:188:2
   [repeats 3 more times]
   .../parsack/parsack.rkt:214:2
   [repeats 8 more times]
   .../parsack/parsack.rkt:373:2
   .../parsack/parsack.rkt:188:2
   .../parsack/parsack.rkt:214:2
   .../parsack/parsack.rkt:188:2
   .../match/compiler.rkt:507:40: f97
   .../parsack/parsack.rkt:214:2
   .../parsack/parsack.rkt:188:2
   .../match/compiler.rkt:507:40: f97
   .../parsack/parsack.rkt:214:2
zyrolasting commented 4 years ago

Okay, I think I see what's happening.

The root cause of the above error is str-val in this line of markdown/parse will sometimes be #f when using multiple threads due to how getState works. And this line in parsack says that the globally shared user state will be reset on a fresh parse.

This means that if you kick off N threads that reach read a Markdown file, there is a chance that Racket will schedule the threads such that a new parse will blow away the user state needed by another thread.

I'm more convinced that thread cells are the way to go (see #53), but I'll check the performance hit. If it's too high, I wonder if it would be worth adding a single global configuration that would opt-in to use of thread cells for the users that need it. If you default that to using no thread cells, then the overhead would be equal to the cost of checking the option each time you access shared state.

greghendershott commented 4 years ago

About performance: As I mentioned to @zyrolasting in private Slack convo, performance had been a big concern, but you'd switched to a port-centric approach which was a big speedup, and maybe any slowdown from parameters would pale in comparison. I couldn't remember the timeline.

Looking, now, I see the port speedup was commit ca5b180 from Jun 2015. The discussion above was later, Dec 2015. So seemingly we were still concerned about a slowdown, even after commit ca5b180? But I don't know if that concern was really justified, or, we were unduly stuck in that old mindset. A ~10X speedup followed by a ~2-3X slowdown would... still be a very large net speedup over the bad old days. I don't want to be cavalier, though. Frog's use of this via markdown is insulated by a build cache; I doubt it would much matter. But I don't know about Pollen's use of this via markdown.

zyrolasting commented 4 years ago

53 has a fresh run of Markdown tests and they pass, FWIW. @stchang I removed you from my personal site repo. Not worried if you look through it, but please delete it when you're done! :nerd_face:

stchang commented 4 years ago

Already uninstalled :)