stchang / parsack

A basic Parsec-like monadic parser combinator library implementation in Racket.
MIT License
50 stars 10 forks source link

`parse` should accept input port #39

Closed stchang closed 9 years ago

stchang commented 9 years ago

@greghendershott Moving the change-to-ports discussion here. Since this is a major architecture revision, I figured I could improve a few things while I'm at it and I would like your input, if you are inclined.

One small item, I noticed the line counting starts at (Pos 1 1 1), which is non-standard. In particular, Racket's line counting starts at (Pos 1 0 1) (ie row 1, col 0, pos 1) and I would have to override it if we want to keep the old behavior. Do you have a preference?

stchang commented 9 years ago

A second small issue. I would like to rename parsack's string combinator, maybe to str. I hate that the name conflicts with Racket's string. But obviously this would be backwards incompatible.

Btw, in case it wasn't clear, I would fix all your code that breaks due to any changes.

greghendershott commented 9 years ago

One small item, I noticed the line counting starts at (Pos 1 1 1), which is non-standard. In particular, Racket's line counting starts at (Pos 1 0 1) (ie row 1, col 0, pos 1) and I would have to override it if we want to keep the old behavior. Do you have a preference?

Well, generally I prefer consistency. A mix of zero- and one-based counting is not. :) I could understand if the "user-facing" line and col were 1-based, but the "code-facing" offset/position were 0-based. But I don't really love the particular Racket mix, standard or not.

Having said all that, following the Racket standard is fine with me, too. I'll defer to you. It's not a big deal, provided we doc it clearly and people know to update the Pos-ofs field in their code, if they're using it.

A second small issue. I would like to rename parsack's string combinator, maybe to str. I hate that the name conflicts with Racket's string. But obviously this would be backwards incompatible.

That's fine.

I'm assuming backward-incompatible, anyway. parsack.rkt is (provide (all-defined-out)). The structs are #:transparent and so any change to State and Pos necessitate that.

BTW I'm also assuming this would be a backward-incompatible new parsack2 package (following the Racket pkg system, which doesn't support backward-incompatible versions, just new packages).

greghendershott commented 9 years ago

Sorry! I hit Close instead of Cancel by mistake.

stchang commented 9 years ago

Some (very) preliminary performance numbers using input-ports instead of the old State:

good news: @jbclements's list-of-links example parses ~100x faster. Even the 5000-item case parses in a few seconds (vs minutes previously).

bad news: perf-tests.rkt is slower by 2-3x. (So about as fast as before my satisfy optimization.) I need to do more profiling to make sure but I worry that this extra overhead is due to the ports themselves.

jbclements commented 9 years ago

On Jun 22, 2015, at 3:08 PM, Stephen Chang notifications@github.com wrote:

Some (very) preliminary performance numbers using input-ports instead of the old State:

good news: @jbclements's list-of-links example parses ~100x faster. Even the 5000-item case parses in a few seconds (vs minutes previously).

Awesome!

bad news: perf-tests.rkt is slower by 2-3x. (So about as fast as before my satisfy optimization.) I need to do more profiling to make sure but I worry that this extra overhead is due to the ports themselves.

This doesn’t bother me :).

John

greghendershott commented 9 years ago

good news: @jbclements's list-of-links example parses ~100x faster. Even the 5000-item case parses in a few seconds (vs minutes previously).

Nice!

bad news: perf-tests.rkt is slower by 2-3x. (So about as fast as before my satisfy optimization.) I need to do more profiling to make sure but I worry that this extra overhead is due to the ports themselves.

This doesn’t bother me :).

First, don't listen to @jbclements he is a horrible, selfish person. :)

Second, in addition to ports, I wonder if this could be due to using parameters? I recall a blog post by Jay pointing out they're bad in hot spots -- e.g. 16x slower than straight mutation.

stchang commented 9 years ago

I wonder if this could be due to using parameters

Yes, parameters are slow. When trying to speed up John's example, the first thing I tried was to rip out the parameters in the markdown parser and replace them with set!. But it didn't help much.

I was going to add a few more to parsack to replace the State parameters, but I haven't gotten around to adding most of them yet (eg the debugging info), so I'm not sure how much they contribute to the slowdown. The only one I added is the userstate. Does parse-markdown use this a lot?

I'll experiment with removing it tomorrow.

greghendershott commented 9 years ago

The $str parser sets user state -- and that's the first parser tried in the <or> of the $inline parser which is used very frequently.

The parsers for smart quotes both set and get user state.

stchang commented 9 years ago

Well it turns out that parameters are not really that slow, but peeking-input-port is. If instead, I manually set the position pointer when backtracking, I get another order of magnitude speedup.

$ raco test perf-test.rkt
raco test: (submod "perf-test.rkt" test)
Strict mode:
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 60, 61, 61, 61, 70 (sorted)
Average: 62.6
--------------------
FAILURE
name:       check-false
location:   (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 37 4 1121 26)
expression: (check-false (< best 100))
params:     (#t)

Check failure
--------------------
Non-strict mode (with extensions):
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 69, 70, 70, 71, 72 (sorted)
Average: 70.4
--------------------
FAILURE
name:       check-false
location:   (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 37 4 1121 26)
expression: (check-false (< best 100))
params:     (#t)

Check failure
--------------------
2/6 test failures

I need to double-check everything because those times seem sort of unbelievable, but all other markdown tests (I ran test.rkt and suite-test.rkt) appear to be passing as well.

stchang commented 9 years ago

@greghendershott I believe I've finished porting parsack (pun intended).

Would you mind trying it (port branch) before I merge to master? I have not made any backwards incompatible changes yet, except where the old State was processed manually. All tests, both parsack and markdown are passing as far as I can tell.

To help with the transition, I also created a port branch in my markdown fork that you can use: https://github.com/stchang/markdown/tree/port

It turns out that adding the debugging info gave back some performance gains, but it's still orders of magnitude faster than before.

$ raco test perf-test.rkt
raco test: (submod "perf-test.rkt" test)
Strict mode:
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 95, 95, 95, 97, 106 (sorted)
Average: 97.6
--------------------
FAILURE
name:       check-false
location:   (#<path:/home/stchang/racket/markdown-stchang/markdown/perf-test.rkt> 37 4 1121 26)
expression: (check-false (< best 100))
params:     (#t)

Check failure
--------------------
Non-strict mode (with extensions):
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 111, 111, 111, 112, 113 (sorted)
Average: 111.6
1/6 test failures

Currently, I tried to maintain as much backward compatibility as possible, but I'm willing to consider other changes now, including changing the debugging info if we want to squeeze out the last drop of performance.

greghendershott commented 9 years ago

This is so awesome!

So I'm trying to think through the next steps, coordinating. I think they are:

  1. Parsack
    1. Update info.rkt with a new version number.
  2. Markdown
    1. Fix markdown/parsack.rkt
    2. Change markdown/perf-test.rkt to decrease the lower bounds.
    3. Update info.rkt to depend on newer Parsack version.
    4. Assuming parsack updates made it onto pkgs.racket-lang.org, the Travis CI test should pass now.

Does that seem correct to you?

If so: Please let me know which of these you'd like to do, vs. me? Specifically do you want to update your markdown PR with a couple more commits for items 2.1, 2.2, 2.3? Or shall I?

greghendershott commented 9 years ago

Oh, I didn't mean to skip over this part:

Currently, I tried to maintain as much backward compatibility as possible, but I'm willing to consider other changes now, including changing the debugging info if we want to squeeze out the last drop of performance.

I'm open to that. In fact I don't even remember what the debugging info is, I'm going to search for that now...

greghendershott commented 9 years ago

Oh, do you mean the OR-DEBUG stuff at the end of markdown/parsack.rkt? If so, I've not been using that.

stchang commented 9 years ago

next steps

I'll take care of the markdown changes. Not sure how I missed those. I need to finish up some additional cleanup and refactoring of parsack as well.

debugging info

The info I'm referring to is the "unexpected" and "expected" chars that are reported on parse error; these used to be part of the Msg in State, but are now separate parameters (actually global vars that get set!). Adding them increased perf-test time by ~50ms on my machine. I'm thinking we should leave them in for now. After all, markdown was still "usable" up until now with perf-test in the 1-2sec range.

stchang commented 9 years ago

Maybe later I can experiment with a(nother) "debug" mode that toggles the expected/unexpected info on and off.

greghendershott commented 9 years ago

Oh the details for the user syntax error messages?

Well, IIUC one of Parsec's (and Parsack's) claims to fame is better quality error messages. So I think at most provide a toggle (but not remove completely).

Having said that, in markdown there's really never any such thing as a syntax error. But I recall they can be helpful when working out the grammar.

Anyway the performance is vastly hugely improved and I am fine with what you've done so far. "Fine" is an understatement!

stchang commented 9 years ago

Yes I agree that those messages are essential. I'm changing my mind from yesterday. I think I was getting carried away with optimizing :)

greghendershott commented 9 years ago

"Dang, parse is slow" removes parse \o/

:)

stchang commented 9 years ago

I think everything should be ready to go. I'll wait for the green light from you. I guess I should merge parsack first? Let me know if you want me to squash the markdown pull-request.

greghendershott commented 9 years ago

Yes please merge parsack first. And, please log into pkgs.racket-lang.org to nudge it to refresh.

Then, on Travis CI if you hit the Restart button on the Build -- here: https://travis-ci.org/greghendershott/markdown/builds/68206156 -- and it should pass this time.

No need to squash the two commits.

Thanks!!

stchang commented 9 years ago

I've merged parsack and updated it on pkg.racket-lang.org

Then, on Travis CI if you hit the Restart button on the Build

I dont see a Restart button. I think I dont have access?

greghendershott commented 9 years ago

Oh, sorry, I thought you'd have it since you submitted the pull request.

I restarted it a moment ago. Waiting for the builds to start. It's still only 2:30 PM in Silicon Valley, so ... :)

stchang commented 9 years ago

I managed to squeeze out a little more performance by not using curry and thunk from racket/function.

Here's the final perf-test output on my machine:

$ raco test perf-test.rkt
raco test: (submod "perf-test.rkt" test)
Strict mode:
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 78, 79, 79, 79, 91 (sorted)
Average: 81.2
Non-strict mode (with extensions):
Using /home/stchang/racket/markdown-stchang/markdown/test/test.md
appended 5 times: 17398 chars and 974 lines. Doing 5 timings...
Timings: 90, 91, 91, 92, 94 (sorted)
Average: 91.6
6 tests passed
greghendershott commented 9 years ago

Nice!

All the builds are finished green except the Racket HEAD snapshot is still downloading from northwestern.edu ...

greghendershott commented 9 years ago

Still waiting for that download. Got ansty. It's passing all the other Racket versions. Went ahead and merged to master, pushed.