Closed stchang closed 9 years ago
A second small issue. I would like to rename parsack's string
combinator, maybe to str
. I hate that the name conflicts with Racket's string
. But obviously this would be backwards incompatible.
Btw, in case it wasn't clear, I would fix all your code that breaks due to any changes.
One small item, I noticed the line counting starts at (Pos 1 1 1), which is non-standard. In particular, Racket's line counting starts at (Pos 1 0 1) (ie row 1, col 0, pos 1) and I would have to override it if we want to keep the old behavior. Do you have a preference?
Well, generally I prefer consistency. A mix of zero- and one-based counting is not. :) I could understand if the "user-facing" line and col were 1-based, but the "code-facing" offset/position were 0-based. But I don't really love the particular Racket mix, standard or not.
Having said all that, following the Racket standard is fine with me, too. I'll defer to you. It's not a big deal, provided we doc it clearly and people know to update the Pos-ofs field in their code, if they're using it.
A second small issue. I would like to rename parsack's string combinator, maybe to str. I hate that the name conflicts with Racket's string. But obviously this would be backwards incompatible.
That's fine.
I'm assuming backward-incompatible, anyway. parsack.rkt is (provide (all-defined-out))
. The struct
s are #:transparent
and so any change to State
and Pos
necessitate that.
BTW I'm also assuming this would be a backward-incompatible new parsack2
package (following the Racket pkg system, which doesn't support backward-incompatible versions, just new packages).
Sorry! I hit Close instead of Cancel by mistake.
Some (very) preliminary performance numbers using input-ports instead of the old State
:
good news: @jbclements's list-of-links example parses ~100x faster. Even the 5000-item case parses in a few seconds (vs minutes previously).
bad news: perf-tests.rkt
is slower by 2-3x. (So about as fast as before my satisfy
optimization.) I need to do more profiling to make sure but I worry that this extra overhead is due to the ports themselves.
On Jun 22, 2015, at 3:08 PM, Stephen Chang notifications@github.com wrote:
Some (very) preliminary performance numbers using input-ports instead of the old State:
good news: @jbclements's list-of-links example parses ~100x faster. Even the 5000-item case parses in a few seconds (vs minutes previously).
Awesome!
bad news: perf-tests.rkt is slower by 2-3x. (So about as fast as before my satisfy optimization.) I need to do more profiling to make sure but I worry that this extra overhead is due to the ports themselves.
This doesn’t bother me :).
John
good news: @jbclements's list-of-links example parses ~100x faster. Even the 5000-item case parses in a few seconds (vs minutes previously).
Nice!
bad news: perf-tests.rkt is slower by 2-3x. (So about as fast as before my satisfy optimization.) I need to do more profiling to make sure but I worry that this extra overhead is due to the ports themselves.
This doesn’t bother me :).
First, don't listen to @jbclements he is a horrible, selfish person. :)
Second, in addition to ports, I wonder if this could be due to using parameters? I recall a blog post by Jay pointing out they're bad in hot spots -- e.g. 16x slower than straight mutation.
I wonder if this could be due to using parameters
Yes, parameters are slow. When trying to speed up John's example, the first thing I tried was to rip out the parameters in the markdown parser and replace them with set!
. But it didn't help much.
I was going to add a few more to parsack
to replace the State
parameters, but I haven't gotten around to adding most of them yet (eg the debugging info), so I'm not sure how much they contribute to the slowdown.
The only one I added is the userstate
. Does parse-markdown
use this a lot?
I'll experiment with removing it tomorrow.
The $str
parser sets user state -- and that's the first parser tried in the <or>
of the $inline
parser which is used very frequently.
The parsers for smart quotes both set and get user state.
Well it turns out that parameters are not really that slow, but peeking-input-port
is. If instead, I manually set the position pointer when backtracking, I get another order of magnitude speedup.
$ raco test perf-test.rkt
raco test: (submod "perf-test.rkt" test)
Strict mode:
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 60, 61, 61, 61, 70 (sorted)
Average: 62.6
--------------------
FAILURE
name: check-false
location: (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 37 4 1121 26)
expression: (check-false (< best 100))
params: (#t)
Check failure
--------------------
Non-strict mode (with extensions):
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 69, 70, 70, 71, 72 (sorted)
Average: 70.4
--------------------
FAILURE
name: check-false
location: (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 37 4 1121 26)
expression: (check-false (< best 100))
params: (#t)
Check failure
--------------------
2/6 test failures
I need to double-check everything because those times seem sort of unbelievable, but all other markdown tests (I ran test.rkt
and suite-test.rkt
) appear to be passing as well.
@greghendershott I believe I've finished porting parsack (pun intended).
Would you mind trying it (port
branch) before I merge to master? I have not made any backwards incompatible changes yet, except where the old State
was processed manually. All tests, both parsack
and markdown
are passing as far as I can tell.
To help with the transition, I also created a port
branch in my markdown
fork that you can use: https://github.com/stchang/markdown/tree/port
It turns out that adding the debugging info gave back some performance gains, but it's still orders of magnitude faster than before.
$ raco test perf-test.rkt
raco test: (submod "perf-test.rkt" test)
Strict mode:
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 95, 95, 95, 97, 106 (sorted)
Average: 97.6
--------------------
FAILURE
name: check-false
location: (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 37 4 1121 26)
expression: (check-false (< best 100))
params: (#t)
Check failure
--------------------
Non-strict mode (with extensions):
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 111, 111, 111, 112, 113 (sorted)
Average: 111.6
1/6 test failures
Currently, I tried to maintain as much backward compatibility as possible, but I'm willing to consider other changes now, including changing the debugging info if we want to squeeze out the last drop of performance.
This is so awesome!
So I'm trying to think through the next steps, coordinating. I think they are:
Does that seem correct to you?
If so: Please let me know which of these you'd like to do, vs. me? Specifically do you want to update your markdown PR with a couple more commits for items 2.1, 2.2, 2.3? Or shall I?
Oh, I didn't mean to skip over this part:
Currently, I tried to maintain as much backward compatibility as possible, but I'm willing to consider other changes now, including changing the debugging info if we want to squeeze out the last drop of performance.
I'm open to that. In fact I don't even remember what the debugging info is, I'm going to search for that now...
Oh, do you mean the OR-DEBUG stuff at the end of markdown/parsack.rkt? If so, I've not been using that.
next steps
I'll take care of the markdown changes. Not sure how I missed those. I need to finish up some additional cleanup and refactoring of parsack as well.
debugging info
The info I'm referring to is the "unexpected" and "expected" chars that are reported on parse error; these used to be part of the Msg
in State
, but are now separate parameters (actually global vars that get set!
). Adding them increased perf-test time by ~50ms on my machine. I'm thinking we should leave them in for now. After all, markdown was still "usable" up until now with perf-test in the 1-2sec range.
Maybe later I can experiment with a(nother) "debug" mode that toggles the expected/unexpected info on and off.
Oh the details for the user syntax error messages?
Well, IIUC one of Parsec's (and Parsack's) claims to fame is better quality error messages. So I think at most provide a toggle (but not remove completely).
Having said that, in markdown there's really never any such thing as a syntax error. But I recall they can be helpful when working out the grammar.
Anyway the performance is vastly hugely improved and I am fine with what you've done so far. "Fine" is an understatement!
Yes I agree that those messages are essential. I'm changing my mind from yesterday. I think I was getting carried away with optimizing :)
"Dang, parse
is slow"
removes parse
\o/
:)
I think everything should be ready to go. I'll wait for the green light from you. I guess I should merge parsack first? Let me know if you want me to squash the markdown pull-request.
Yes please merge parsack first. And, please log into pkgs.racket-lang.org to nudge it to refresh.
Then, on Travis CI if you hit the Restart button on the Build -- here: https://travis-ci.org/greghendershott/markdown/builds/68206156 -- and it should pass this time.
No need to squash the two commits.
Thanks!!
I've merged parsack and updated it on pkg.racket-lang.org
Then, on Travis CI if you hit the Restart button on the Build
I dont see a Restart button. I think I dont have access?
Oh, sorry, I thought you'd have it since you submitted the pull request.
I restarted it a moment ago. Waiting for the builds to start. It's still only 2:30 PM in Silicon Valley, so ... :)
I managed to squeeze out a little more performance by not using curry
and thunk
from racket/function
.
Here's the final perf-test output on my machine:
$ raco test perf-test.rkt
raco test: (submod "perf-test.rkt" test)
Strict mode:
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 78, 79, 79, 79, 91 (sorted)
Average: 81.2
Non-strict mode (with extensions):
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 90, 91, 91, 92, 94 (sorted)
Average: 91.6
6 tests passed
Nice!
All the builds are finished green except the Racket HEAD snapshot is still downloading from northwestern.edu ...
Still waiting for that download. Got ansty. It's passing all the other Racket versions. Went ahead and merged to master, pushed.
@greghendershott Moving the change-to-ports discussion here. Since this is a major architecture revision, I figured I could improve a few things while I'm at it and I would like your input, if you are inclined.
One small item, I noticed the line counting starts at
(Pos 1 1 1)
, which is non-standard. In particular, Racket's line counting starts at(Pos 1 0 1)
(ie row 1, col 0, pos 1) and I would have to override it if we want to keep the old behavior. Do you have a preference?