swagger-api / apidom

Semantic parser for API specifications
https://swagger-api.github.io/apidom/
68 stars 17 forks source link

Second round of performance profiling #691

Open char0n opened 2 years ago

char0n commented 2 years ago

This issue is specific to apidom-parser-adapter-json, but the optimization that comes out of this issue could be possibly used in every package. It seems that we spend a lot of time in syntactic analysis phase, specifically in building minim objects. This needs to be inspected and profiled; it seems that the building time is unreasonably long.

 * Parse stage: 112,61 ms
 *   Lexical Analysis phase: 1,42 ms
 *   Syntactic Analysis phase: 85,91 ms
 *     Traversing time: 0,27 ms
 *     Building time: 85,64 ms
 * Refract stage: 18,18 ms

Node internal profiler mechanism is used to identify code which needs to be optimized.

Refs https://github.com/swagger-api/apidom/issues/385

char0n commented 2 years ago

Real world fixtures are being integrated in #792 and #802

char0n commented 2 years ago

JSON direct syntactic analyzer (layer that translates tree-sitter CST into generic ApiDOM). Here are some numbers how ApiDOM parsing speed increased after this perf optimization.

Before

 * Parse stage: 112,61 ms
 *   Lexical Analysis phase: 1,42 ms
 *   Syntactic Analysis phase: 85,91 ms
 *     Traversing time: 0,27 ms
 *     Building time: 85,64 ms
 * Refract stage: 18,18 ms

After

 * Parse stage: 24,15 ms
 *   Lexical Analysis phase: 1,28 ms
 *   Syntactic Analysis phase: 14,99 ms
 *     Traversing time: 0,59 ms
 *     Building time: 14,39 ms
 * Refract stage: 15,75 ms
char0n commented 2 years ago

JSON indirecty syntactic analysis optimization results:

Before

4.45 ops/sec ±0.92% (621 runs sampled)

After

11.07 ops/sec ±0.90% (650 runs sampled)
char0n commented 2 years ago

YAML syntactic analysis optimization results:

Before

4.77 ops/sec ±0.82% (623 runs sampled)

After

9.93 ops/sec ±1.11% (642 runs sampled)
char0n commented 2 years ago

During the weekend I did extensive optimizations on all our syntactic analyzers - YAML and JSON. I managed to increase the syntactic analysis phase by another factor of 2 - parsing is now two times faster again. It's not much, but it's something at least...

During the optimizations I've managed to identify the biggest performance problem that we currently have. It's unfortunately not in our code, but in tree-sitter binding code. It takes extreme amount of time to access tree-sitter CST nodes and it's attributes. On relatively small CST trees (300 lines of YAML), the traversal of CST takes more than 20 ms. Access time of tree sitter CST seems to be linear, so for 3000 lines of YAML we get around 200 ms of just traversal time. I've created an issue with the tree-sitter, to find out If I'm doing something wrong or this is something expected.

Just for comparison:

https://github.com/tree-sitter/tree-sitter/issues/1469

char0n commented 2 years ago

tree-sitter cursor traversal needs to be used to get the performance we need. Full CST traversal is around 10x faster when used.

POC of cursor traversal:

  const cursor = cst.walk();

  let reached_root = false;
  while (!reached_root) {
    //github.com/tree-sitter/py-tree-sitter/issues/33
    https: const type = cursor.nodeType;
    console.dir(cursor.nodeText);

    if (cursor.gotoFirstChild()) {
      continue;
    }

    if (cursor.gotoNextSibling()) {
      continue;
    }

    let retracting = true;
    while (retracting) {
      if (!cursor.gotoParent()) {
        retracting = false;
        reached_root = true;
      }

      if (cursor.gotoNextSibling()) {
        retracting = false;
      }
    }
  }
char0n commented 2 years ago

In https://github.com/swagger-api/apidom/pull/933 I've implemented two specialized iterators that provide optimized access to tree-sitter CST. Accessing the tree-sitter CST via cursor mechanism is 10 times faster. As cursor traversal is not compatible with our own ApiDOM traversal, adapter in form of two (2) iterators have been created.

PreOrderCursorChildrenIterator

This iterator uses Preorder Depth First traversal to create tree structure compatible with tree-sitter Tree using tree-sitter cursor traversal mechanism which provides cached and optimized access to CST.

PreOrderCursorIterator

This iterator uses Preorder Depth First traversal to create list of tree-sitter SyntaxNode like structures using tree-sitter cursor traversal mechanism which provides cached and optimized access to CST.

Currently these iterators are not yet utilized as their utilization is blocked by the following PR I've issued for tree-sitter Node.js binding. If we utilized them now without the PR we'll see performance degradation in parsing speed for Node.js env, but increased performance for Browser env (which might be acceptable for now).

Performance improvement

According to benchmarks and profiling I've made, by using iterators the performance increased by around 400% (4 times), from 45 ops/s to 171 ops/s.

char0n commented 2 years ago

Current status: waiting for https://github.com/tree-sitter/node-tree-sitter/pull/96 to be merged to utilize the cursor traversal iterators and increase the parsing speed.

char0n commented 1 year ago

https://github.com/tree-sitter/node-tree-sitter/pull/96 has been finally merged. We need to wait for new release of tree-sitter and we can utilize the cursor traveral.

char0n commented 1 year ago

tree-sitter@0.20.4 has been released and tree-sitter cursor traversal has been utilized in apidom-parser-adapter-json. The performance of JSON syntactic analysis has been increased by 100% (direct & indirect).

char0n commented 1 year ago

tree-sitter@0.20.4 has been released and tree-sitter cursor traversal has been utilized in apidom-parser-adapter-yaml-1-2. The performance of YAML 1.2 syntactic analysis is now 7x faster (indirect).