Open sunny-chung opened 1 week ago
How long does the CLI take for the same file? (tree-sitter parse foo.json
)
About 2 seconds.
> time tree-sitter parse 13mb.json > /dev/null
real 0m2.174s
user 0m1.721s
sys 0m0.449s
Additional information:
If the traditional regular expression approach is used, it takes 765ms to parse the 13 MB JSON in JVM. To parse a 1 MB JSON, ktreesitter takes 1.4s and regular expression takes 60ms.
I guess the overhead is in the round-trip interop I/O time between C and JVM multiplied by the length of the string. The situation might be improved if the parser queries a substring at a time rather than a character.
The situation might be improved if the parser queries a substring at a time rather than a character.
That is up to your callback.
Could you elaborate more? The ParseCallback
type provided by ktreesitter has an input of a single byte and Point, not a range.
Quoting from ktreesitter's doc:
expect fun parse(oldTree: Tree? = null, callback: ParseCallback): Tree
typealias ParseCallback = (byte: UInt, point: Point) -> CharSequence?
You can return a chunk and the position will be offset accordingly.
and got troubles with multi-byte characters
Please submit another issue with more details.
the doc is wrong. null would result in crash
Fixed in 7fc4734
Environment:
Description:
ktreesitter takes 20 seconds to initialize a 13 MB string. All other stuffs, like incremental updating or querying the AST, work fine and quick. This is what I am using to initialize:
I tried to feed direct strings like below, and it doesn't speed up to an acceptable level, and got troubles with multi-byte characters.
My use case is to load and edit a 13 MB JSON with syntax highlighting. If this passes, I will feed in even larger JSON data.
Anything could be done or work around? Not sure if it helps, I am using a rope data structure on JVM side.