tree-sitter / tree-sitter-haskell

Haskell grammar for tree-sitter.
MIT License
155 stars 36 forks source link

Faster Scanner (part 2) #54

Closed 414owen closed 2 years ago

414owen commented 2 years ago

See part 1 for more info.


old new speedup
effects 1.07s 0.034s 31.5x
postgrest stackoverflow 0.083s ∞x
ivory 10.16s 0.225s 45.16x
polysemy 3.71s 0.089s 41.7x
semantic 50.62s 1.089s 46.5x
haskell-language-server 24.28s 0.504s 48.2x

Overall speedup: around 45x.

414owen commented 2 years ago
$ ./script/parse-example effects |& head -n 100
Successfully parsed 100.00% of 'effects' in 0.09s (38 of 38 files)

Original time: 1.09s

1.09 / 0.09 = 12.11
maxbrunsfeld commented 2 years ago

Nice work on this.

Having removed the use of closures, would it be straightforward to get this compiling without C++14 features? That would make this parser a bit easier to consume in downstream applications, and develop on a Mac.

tek commented 2 years ago

awesome!!

414owen commented 2 years ago

Having removed the use of closures, would it be straightforward to get this compiling without C++14 features?

Yes I plan to remove the need for c++14, and maybe even convert the scanner to pure C :)

maxbrunsfeld commented 2 years ago

Oh, that’s interesting. I’d think that C++ would be useful just for std::vector. are you thinking that a dynamically sized array isn’t needed in the scanner?

414owen commented 2 years ago

are you thinking that a dynamically sized array isn’t needed in the scanner?

I think we will need one, but I'm happy to implement my own if it means we can compile this as C.

While we have you here, would we be able to get your opinion on https://github.com/tree-sitter/tree-sitter-haskell/issues/41#issuecomment-1003353274? We think the scanner's probably running over the whole file (or a lot of it) every edit, but we're not sure if it should.

wenkokke commented 2 years ago

@414owen Amazing work! How far are you from hitting C++03? Once you do, I can check if the wasm can build successfully via the standard pathways now, and perhaps contribute some tests for that from #55.

414owen commented 2 years ago

@414owen Amazing work! How far are you from hitting C++03?

Thanks :smile: . I'm actually pretty close to removing the last closure. I've removed the Parser type completely, now I just have to give the same treatment to Conditions and Peeks, which are way easier.

I'd say I'll be done later today.

414owen commented 2 years ago

@wenkokke I've removed all the closures. It now compiles with -std=c++11 but not -std=c++03.

I'll see what I can do.

414owen commented 2 years ago
$ ./script/parse-example effects |& head -n 100
Successfully parsed 100.00% of 'effects' in 0.0297s (38 of 38 files)

Original time: 1.09s

1.09 / 0.0297 = 36.7

There might be one or two micro-optimizations left (PEEKing before taking some code paths?) As well as maybe decreasing the necessary re-computations (see https://github.com/tree-sitter/tree-sitter-haskell/issues/41#issuecomment-950424044)

But this will in any case let us edit 36x larger files than before :tada:

414owen commented 2 years ago

@wenkokke I've added c++03 support here: https://github.com/414owen/tree-sitter-haskell/pull/1

wenkokke commented 2 years ago

With your current faster-scanner branch, you've got the patch to tree-sitter down to:

--- a/lib/binding_web/exports.json
+++ b/lib/binding_web/exports.json
@@ -3,6 +3,9 @@
   "_free",
   "_malloc",

+  "__ZNSt3__25ctypeIcE2idE",
+  "__ZNSt3__212basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE25__init_copy_ctor_externalEPKcm",
+
   "__ZNKSt3__212basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE4copyEPcmm",
   "__ZNKSt3__220__vector_base_commonILb1EE20__throw_length_errorEv",
   "__ZNSt3__212basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEE6__initEPKcm",

See https://github.com/wenkokke/tree-sitter-haskell/tree/tree-sitter-haskell-wasm-faster-scanner.

414owen commented 2 years ago

@tek This is ready for review, when you have a chance.

All tests pass. Any further changes will probably be cosmetic (moving things around, rather than changing what they do).

wenkokke commented 2 years ago

@tek I have an updated version of #55 for when this pull request is merged

tek commented 2 years ago

awesome!

414owen commented 2 years ago

@luc-tielen I appreciate the comments, but I think a lot of them are out of scope, either because I haven't changed the code from what it was doing before, or because they're microoptimizations that the compiler may or may not already be doing.

I'd be interested to see if there's a performance win by implementing your constexpr hints. Would you be willing to open a PR for them? Same goes with two peeks here comments. Not sure if they're already optimized out, but I'd love to see some numbers :)

luc-tielen commented 2 years ago

@414owen That's ok. Your PR was fine, so I could only point out the nitpicky stuff.

tek commented 2 years ago

@414owen it appears to get stuck in an infinite loop when parsing HLS for me, can you confirm?

414owen commented 2 years ago

@414owen it appears to get stuck in an infinite loop when parsing HLS for me, can you confirm?

I'm building helix atm. I'll run it in gdb and see where :)

414owen commented 2 years ago

Okay I think I need to check for an eof in inline-comment. I'll PR in a few mins.

414owen commented 2 years ago
old new speedup
effects 1.07s 0.034s 31.5x
postgrest stackoverflow 0.083s ∞x
ivory 10.16s 0.225s 45.16x
polysemy 3.71s 0.089s 41.7x
semantic 50.62s 1.089s 46.5x
haskell-language-server 24.28s 0.504s 48.2x
tek commented 2 years ago

crazy