Closed stchang closed 4 years ago
Restoring parameters leads to a ~2x slowdown, e.g. here is the performance test for markdown
before:
Strict mode:
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 94, 95, 95, 96, 108 (sorted)
Average: 97.6
Non-strict mode (with extensions):
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 107, 107, 111, 112, 115 (sorted)
Average: 110.4
after:
Strict mode:
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 182, 182, 183, 183, 195 (sorted)
Average: 185.0
--------------------
FAILURE
name: check-true
location: (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 33 4 962 24)
expression: (check-true (< avg 170))
params: (#f)
Check failure
--------------------
Non-strict mode (with extensions):
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 207, 207, 208, 208, 210 (sorted)
Average: 208.0
--------------------
FAILURE
name: check-true
location: (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 33 4 962 24)
expression: (check-true (< avg 170))
params: (#f)
Check failure
--------------------
--------------------
FAILURE
name: check-true
location: (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 34 4 991 26)
expression: (check-true (< worst 200))
params: (#f)
Check failure
--------------------
This is probably too much so I'm inclined to leave things the way they are. @greghendershott do you agree?
I tend to agree, especially not knowing the motivation for avoiding set!
.
not knowing the motivation for avoiding set!
I believe it was to make it easier to add types, something which is on my todo as well.
@stchang Hello from 2020. I actually ended up chatting with @greghendershott about this but on a different topic. I have a concurrent build where a few threads each parse a Markdown file. There's an intermittent issue where the build will fail with this error:
...
string-length: contract violation
expected: string?
given: #f
context...:
condition->exn
do-raise
dynamic-wind
.../parsack/parsack.rkt:188:2
[repeats 5 more times]
.../parsack/parsack.rkt:265:2
.../parsack/parsack.rkt:214:2
[repeats 16 more times]
.../parsack/parsack.rkt:373:2
.../parsack/parsack.rkt:188:2
.../parsack/parsack.rkt:214:2
.../parsack/parsack.rkt:188:2
.../match/compiler.rkt:507:40: f97
.../parsack/parsack.rkt:214:2
.../parsack/parsack.rkt:464:0: parse
[repeats 1 more time]
Is it possible that the set!
s not being synchronized would cause this issue? If so, would that justify restoring parameters, or at least use of thread cells?
Hi @zyrolasting I think it's very possible. I haven't been mindful of thread safety.
Do you have a concrete example I could look at though?
Since many other packages (markdown, frog, pollen) use this package in its current state, I'm a little wary of making invasive changes without a concrete use case.
I have two.
First, I invited you to collaborate on a private repo where the trace came from. Just clone, raco pkg install --link
, then run racket build/main.rkt
until you see the error.
Second, there's a more minimal, yet less reliable reproduction here: https://github.com/zyrolasting/parsack-threading-break
The reason I invited you to the private repo is because the same error seems to occur there more often. I think that's because each thread is doing a lot more, and the scheduler has more cause to have one thread that's parsing yield to another thread that's parsing.
The timing issue is subtle, so my only instruction to reproduce the issue so far is "keep trying" while I narrow down a way to guarantee the exception.
I installed sgcom
and can confirm an intermittent error.
unlike-assets: UPDATE: polyglot.rss.xml
error writing to stream port
system error: Broken pipe; errno=32
context...:
/home/stchang/plt-stchang/racket/collects/racket/private/port.rkt:126:2: loop
/home/stchang/sgcom/build/highlight-code.rkt:35:0: highlight-code
/home/stchang/sgcom/build/highlight-code.rkt:48:4: for-loop
/home/stchang/sgcom/build/highlight-code.rkt:47:2: recur
/home/stchang/sgcom/build/html.rkt:17:0: make-dependent-html-page
.../more-scheme.rkt:261:28
/home/stchang/plt-stchang/racket/share/pkgs/kinda-ferpy/main.rkt:70:9
I'll investigate but probably not for a few weeks.
If you're interested in experimenting, you can first try changing these to racket parameters.
That's alright, please take your time. In the worst case scenario I'll lean on a fork. My experimentation leans on plain thread cells instead of parameters. With luck that won't add much overhead.
Re: that error you saw, I am unsure if it's relevant. That would depend on if the synchronization problem occurs with ports, or the global state for expected characters. It may overlap with this thread.
I did manage to reproduce the parsack exn that concerned me using the minimal repro by having it parse 26 ~500KiB Markdown files. I used Racket 7.6, CS. Then I use raco make main.rkt
and racket main.rkt
.
[sage@localhost parsack-threading-break]$ racket main.rkt
string-length: contract violation
expected: string?
given: #f
context...:
condition->exn
do-raise
dynamic-wind
.../parsack/parsack.rkt:188:2
[repeats 3 more times]
.../parsack/parsack.rkt:214:2
[repeats 8 more times]
.../parsack/parsack.rkt:373:2
.../parsack/parsack.rkt:188:2
.../parsack/parsack.rkt:214:2
.../parsack/parsack.rkt:188:2
.../match/compiler.rkt:507:40: f97
.../parsack/parsack.rkt:214:2
.../parsack/parsack.rkt:188:2
.../match/compiler.rkt:507:40: f97
.../parsack/parsack.rkt:214:2
Okay, I think I see what's happening.
The root cause of the above error is str-val
in this line of markdown/parse
will sometimes be #f
when using multiple threads due to how getState
works. And this line in parsack
says that the globally shared user state will be reset on a fresh parse.
This means that if you kick off N threads that reach read a Markdown file, there is a chance that Racket will schedule the threads such that a new parse will blow away the user state needed by another thread.
I'm more convinced that thread cells are the way to go (see #53), but I'll check the performance hit. If it's too high, I wonder if it would be worth adding a single global configuration that would opt-in to use of thread cells for the users that need it. If you default that to using no thread cells, then the overhead would be equal to the cost of checking the option each time you access shared state.
About performance: As I mentioned to @zyrolasting in private Slack convo, performance had been a big concern, but you'd switched to a port-centric approach which was a big speedup, and maybe any slowdown from parameters would pale in comparison. I couldn't remember the timeline.
Looking, now, I see the port speedup was commit ca5b180 from Jun 2015. The discussion above was later, Dec 2015. So seemingly we were still concerned about a slowdown, even after commit ca5b180? But I don't know if that concern was really justified, or, we were unduly stuck in that old mindset. A ~10X speedup followed by a ~2-3X slowdown would... still be a very large net speedup over the bad old days. I don't want to be cavalier, though. Frog's use of this via markdown is insulated by a build cache; I doubt it would much matter. But I don't know about Pollen's use of this via markdown.
Already uninstalled :)
I've received a request to remove the use of
set!
. Essentially, this would restore the parameters I previously removed, e.g. foruser-state
. The parameters were removed for performance reasons but were ultimately not the bottleneck so restoring them should be the right thing to do. Opening this ticket while I do some experimenting.