Open char0n opened 2 years ago
Real world fixtures are being integrated in #792 and #802
JSON direct syntactic analyzer (layer that translates tree-sitter CST into generic ApiDOM). Here are some numbers how ApiDOM parsing speed increased after this perf optimization.
Before
* Parse stage: 112,61 ms
* Lexical Analysis phase: 1,42 ms
* Syntactic Analysis phase: 85,91 ms
* Traversing time: 0,27 ms
* Building time: 85,64 ms
* Refract stage: 18,18 ms
After
* Parse stage: 24,15 ms
* Lexical Analysis phase: 1,28 ms
* Syntactic Analysis phase: 14,99 ms
* Traversing time: 0,59 ms
* Building time: 14,39 ms
* Refract stage: 15,75 ms
JSON indirecty syntactic analysis optimization results:
Before
4.45 ops/sec ±0.92% (621 runs sampled)
After
11.07 ops/sec ±0.90% (650 runs sampled)
YAML syntactic analysis optimization results:
Before
4.77 ops/sec ±0.82% (623 runs sampled)
After
9.93 ops/sec ±1.11% (642 runs sampled)
During the weekend I did extensive optimizations on all our syntactic analyzers - YAML and JSON. I managed to increase the syntactic analysis phase by another factor of 2 - parsing is now two times faster again. It's not much, but it's something at least...
During the optimizations I've managed to identify the biggest performance problem that we currently have. It's unfortunately not in our code, but in tree-sitter binding code. It takes extreme amount of time to access tree-sitter CST nodes and it's attributes. On relatively small CST trees (300 lines of YAML), the traversal of CST takes more than 20 ms. Access time of tree sitter CST seems to be linear, so for 3000 lines of YAML we get around 200 ms of just traversal time. I've created an issue with the tree-sitter, to find out If I'm doing something wrong or this is something expected.
Just for comparison:
yaml-js
library (written in pure JS) can parse the 300 lines of YAML in 0.5 ms.yaml
library (written in pure JS) produces CST internally and then transformes into JavaScript AST in under 5 mstree-sitter cursor traversal needs to be used to get the performance we need. Full CST traversal is around 10x faster when used.
POC of cursor traversal:
const cursor = cst.walk();
let reached_root = false;
while (!reached_root) {
//github.com/tree-sitter/py-tree-sitter/issues/33
https: const type = cursor.nodeType;
console.dir(cursor.nodeText);
if (cursor.gotoFirstChild()) {
continue;
}
if (cursor.gotoNextSibling()) {
continue;
}
let retracting = true;
while (retracting) {
if (!cursor.gotoParent()) {
retracting = false;
reached_root = true;
}
if (cursor.gotoNextSibling()) {
retracting = false;
}
}
}
In https://github.com/swagger-api/apidom/pull/933 I've implemented two specialized iterators that provide optimized access to tree-sitter CST. Accessing the tree-sitter CST via cursor mechanism is 10 times faster. As cursor traversal is not compatible with our own ApiDOM traversal, adapter in form of two (2) iterators have been created.
This iterator uses Preorder Depth First traversal to create tree structure compatible with tree-sitter Tree using tree-sitter cursor traversal mechanism which provides cached and optimized access to CST.
This iterator uses Preorder Depth First traversal to create list of tree-sitter SyntaxNode like structures using tree-sitter cursor traversal mechanism which provides cached and optimized access to CST.
Currently these iterators are not yet utilized as their utilization is blocked by the following PR I've issued for tree-sitter Node.js binding. If we utilized them now without the PR we'll see performance degradation in parsing speed for Node.js env, but increased performance for Browser env (which might be acceptable for now).
According to benchmarks and profiling I've made, by using iterators the performance increased by around 400% (4 times), from 45 ops/s to 171 ops/s.
Current status: waiting for https://github.com/tree-sitter/node-tree-sitter/pull/96 to be merged to utilize the cursor traversal iterators and increase the parsing speed.
https://github.com/tree-sitter/node-tree-sitter/pull/96 has been finally merged. We need to wait for new release of tree-sitter
and we can utilize the cursor traveral.
tree-sitter@0.20.4
has been released and tree-sitter cursor traversal has been utilized in apidom-parser-adapter-json
. The performance of JSON syntactic analysis has been increased by 100% (direct & indirect).
tree-sitter@0.20.4 has been released and tree-sitter cursor traversal has been utilized in apidom-parser-adapter-yaml-1-2
. The performance of YAML 1.2 syntactic analysis is now 7x faster (indirect).
This issue is specific to
apidom-parser-adapter-json
, but the optimization that comes out of this issue could be possibly used in every package. It seems that we spend a lot of time in syntactic analysis phase, specifically in building minim objects. This needs to be inspected and profiled; it seems that the building time is unreasonably long.Node internal profiler mechanism is used to identify code which needs to be optimized.
Refs https://github.com/swagger-api/apidom/issues/385