tree-sitter / tree-sitter-haskell

Haskell grammar for tree-sitter.
MIT License
152 stars 36 forks source link

Faster Scanner (part 1) #52

Closed 414owen closed 2 years ago

414owen commented 2 years ago

The least invasive way to make this scanner performant seems to be to:

For background, see https://github.com/tree-sitter/tree-sitter-haskell/issues/41

These are mostly mechanical changes, so they should be easy enough to review.
If you see a better way to lower something into fast, first-order logic, let me know :)

As our benchmark, this is on my laptop on master:

$ ./script/parse-example effects
Successfully parsed 100.00% of 'effects' in 1.06s (38 of 38 files)

And this is on the current branch:

$ ./script/parse-example effects
Successfully parsed 100.00% of 'effects' in 0.59s (38 of 38 files)

I'll post timings after I push commits.


Risk assessment:

As the changes are mostly mechanical, we shouldn't risk breaking anything too badly.
It's also trivial to mix the old Parser type and straight up c++ functions, so it's easy to change a small function, get it compiling, and run tree-sitter test.

It is technically possible to merge this branch as is, with the only partially-converted scanner.
@tek it's up to you how you want to proceed here. I'm happy to split this up into small PRs, or just keep adding to this branch.


This code looks worse

I actually don't think it's that bad.
There's something nice about having simple first-order functions.
I think it's equivalently bad to having "pure" parsers with side-effects ;)
It will also be pretty easy to convert this to pure C when we're finished (which I would be in favor of)

tek commented 2 years ago

two thoughts:

tek commented 2 years ago

oh and I'm fine with making this one large PR

414owen commented 2 years ago

maybe keep a copy of the original around

The original scanner? Sure. Maybe we can have a branch that contains the combinator version? I'd be wary of keeping them in the same file, just because of name conflicts and clutter.

414owen commented 2 years ago
$ ./script/parse-example effects
Successfully parsed 100.00% of 'effects' in 0.42s (38 of 38 files)
tek commented 2 years ago

The original scanner? Sure. Maybe we can have a branch that contains the combinator version? I'd be wary of keeping them in the same file, just because of name conflicts and clutter.

yeah I would have thought a separate file in the same dir, optimally in such a way that the tests can be run with either

$ ./script/parse-example effects
Successfully parsed 100.00% of 'effects' in 0.42s (38 of 38 files)

:partying_face:

414owen commented 2 years ago

can we just make State a global variable? it's passed as a mutable reference anyway

I gave it a go... https://github.com/414owen/tree-sitter-haskell/commit/cf221b2defd1958ef75f89eef7940ddae86b3fa8

Problem is a vec& has no default constructor. I changed it to a vec and started getting crazy linker errors:

Error opening dynamic library "/home/owen/.cache/tree-sitter/lib/haskell.so"

Caused by:
    /home/owen/.cache/tree-sitter/lib/haskell.so: undefined symbol: _ZN5StateC1Ev

So I'll leave that part to somebody who likes debugging c++ linker errors.

414owen commented 2 years ago
$ ./script/parse-example effects |& head -n 100
Successfully parsed 100.00% of 'effects' in 0.29s (38 of 38 files)
tek commented 2 years ago

brilliant

414owen commented 2 years ago

@tek I'm finished for the day. Might be an okay place to do a merge. I tested it in a real editor, and it's actually pretty usable. I can no longer type faster than it updates :)

I'll still continue the conversion, for the sake of my battery life.

tek commented 2 years ago

splendid