tree-sitter / tree-sitter-python

Python grammar for tree-sitter
MIT License
360 stars 132 forks source link

perf(scanner)!: store and reuse `next_indent_length` #245

Closed llllvvuu closed 10 months ago

llllvvuu commented 12 months ago

Whether an indent/dedent token is generated for a comment depends on whether the indent/dedent is persisted after the comment. This takes O(comment_length) to check. Checking it for every line takes O(comment_length * comment_lines), which is quadratic.

An optimization was made in a901729 which skips this check when there is no possible indent/dedent, such as in the following:

 # comment 1...
 # comment 1000
 print("foo")

However, the check cannot be skipped in the following case:

class Foo:
    def foo():
        print("bar")
    # comment 1...
    # comment 1000
    def bar():
        print("foo")

which comes up when commenting out code.

This PR optimizes all cases by storing and re-using the next non-comment indent length and re-using it while still valid (requires TSLexer.get_byte from upstream tree-sitter).

Sample file: https://gist.github.com/llllvvuu/bc80d4df5f39cc27c13297e1d3a39190

Before:

tree-sitter parse ~/test/repro2.py  0.52s user 0.01s system 99% cpu 0.535 total

After:

.../release/tree-sitter parse   0.01s user 0.01s system 88% cpu 0.023 total

Fixes: https://github.com/nvim-treesitter/nvim-treesitter/issues/4839

BREAKING CHANGE: This makes the parser incompatible with the current tree-sitter ABI.

llllvvuu commented 12 months ago

Required upstream change: https://github.com/tree-sitter/tree-sitter/pull/2648