kotlinx / ast

Generic AST parsing library for kotlin multiplatform
Apache License 2.0
323 stars 22 forks source link

Parsing large files is too slow #3

Open drieks opened 5 years ago

drieks commented 5 years ago

These files are currently not included in SelfTest.kt because the processing does not finish within a reasonable time:

drieks commented 4 years ago

Hi @martinflorek,

please try version fd6123da02. Can you tell me the required parsing time of the old and the new version? Thank you very much!

martinflorek commented 4 years ago

I am not able to properly measure the parsing time only, because I process several source code repositories at once and I am looking for specific files only before parsing them.

But the new version runs a bit faster. All my processing went from ~33 seconds to ~32 seconds. Version with Kastree runs in 1.7 seconds.

drieks commented 4 years ago

I refactored kotlinx.ast so that it is now possible to use both antlr-kotlin and antlr-java to parse kotlin sources. Example: https://github.com/kotlinx/ast/blob/master/grammar-kotlin-parser-antlr-java/src/test/kotlin/kotlinx/ast/example/ExampleMain.kt

But sadly, it seems that antlr-kotlin is not much slower than antlr-java. I will try to figure out how to speed up parsing.

drieks commented 4 years ago

@ShikaSD pointed me to antlr-optimized, so I implemented support for this antlr fork in kotlinx.ast. But sadly, it is not as fast as hoped. I will try to implement a lexer and parser using antlr4 grammar files, only supporting the features that are required to parse kotlin files. I already added support to parse antlr4 grammar files for this use case in kotlinx.ast:grammar-antlr4-parser-antlr-java.

drieks commented 3 years ago

The time for ./gradlew clean check was reduced from 3min 30s in commit c7dd6bbd5419789a7feba0d68cf6f1f326197103 to 2min 30s in commit f088b3cf8de0817e9f235c0b53e0923127956b22.

because of this, all kotlin files will now be scanned in the self test.

it is still required to speed this up, I think we need some patch to the kotlin parser/lexer for this.

drieks commented 3 years ago

build time for commit 95db180495bb46afe42767191996e3cd49cd96cf is 44s, so we can assume that testing the previusly excluded files takes around 1 minute 45s.

fab1an commented 2 years ago

Can you have a look at my comment in #50 ? Why is a large garbage-string faster than a large string containing json?