kotlinx / ast

Generic AST parsing library for kotlin multiplatform
Apache License 2.0
317 stars 22 forks source link

How can I get the AST representing statements inside the method? #21

Open ghost opened 3 years ago

ghost commented 3 years ago

When parsing AST, how can I find out on which line a particular Class or Function was declared?

ghost commented 3 years ago

And how can I get the AST representing statements inside the method?

drieks commented 3 years ago

Hi @rajatpundir, sorry for the late response. Adding the line number should be easy, I will look into this.

Parsing the body of a function is currently not possible, but maybe easy to add. I will try to figure out how complicated this is.

drieks commented 3 years ago

Hi @rajatpundir,

I added support for line/row information and buffer index offset. You can find examples here: https://github.com/kotlinx/ast/tree/master/grammar-kotlin-parser-test/src/commonMain/resources/testdata

The Kotlin source can be found in *.kt.txt, eg. in https://github.com/kotlinx/ast/blob/master/grammar-kotlin-parser-test/src/commonMain/resources/testdata/Class.kt.txt, the position info for the raw ast (*.raw.info.txt) can be found here: https://github.com/kotlinx/ast/blob/master/grammar-kotlin-parser-test/src/commonMain/resources/testdata/Class.raw.info.txt and the position info for the summary ast can be found here (*.summary.info.txt): https://github.com/kotlinx/ast/blob/master/grammar-kotlin-parser-test/src/commonMain/resources/testdata/Class.summary.info.txt

The number in the first row is the index in the token stream, so only tokens (leafs of the tree) have such an id. the first range the start and end in the buffer (starting from zero), the second range is line:col, both starting at 1.

You can use astInfoOrNull from package kotlinx.ast.common.ast to find the ast position info:

data class AstInfo(
    val id: Int,
    val start: AstInfoPosition,
    val stop: AstInfoPosition,
)

data class AstInfoPosition(
    val index: Int,
    val line: Int,
    val row: Int,
)
drieks commented 3 years ago

And how can I get the AST representing statements inside the method?

Sadly, there are only some summary nodes, but the full kotlin language is currently not covered. I'm using the library to build an outline of the file. you can access the raw ast of the function, for example this function:

@Annotated
private fun annotated2(@AnotherAnnotation x: Int): Int {
    return x * 2
}

found here: https://github.com/kotlinx/ast/blob/master/grammar-kotlin-parser-test/src/commonMain/resources/testdata/Class.kt.txt#L14-L17

will be represented in the summay ast as:

    KlassDeclaration(fun annotated2 Int)
      KlassAnnotation(Annotated)
      KlassModifier(private, visibilityModifier)
      KlassDeclaration(parameter x Int)
        KlassAnnotation(AnotherAnnotation)

https://github.com/kotlinx/ast/blob/master/grammar-kotlin-parser-test/src/commonMain/resources/testdata/Class.summary.ast.txt#L26-L30

When you use the extension function Ast.rawAstOrNull on the KlassDeclaration, you should get this raw ast:

https://github.com/kotlinx/ast/blob/master/grammar-kotlin-parser-test/src/commonMain/resources/testdata/Class.raw.ast.txt#L301-L425

It is possible to add the required ast nodes for a summary here: https://github.com/kotlinx/ast/blob/master/grammar-kotlin-parser-common/src/commonMain/kotlin/kotlinx/ast/grammar/kotlin/common/summary/KotlinTreeMapBuilder.kt

for every possible ast node type, a convert call can be added there to convert the given type of raw ast to the summay ast. all existing ast nodes are already added as comments. But sadly, I have currently no time to implement this.

If anybody can help implementing this, please feel free to send me any questions!

lululu016 commented 2 years ago

Hi, @drieks Recently I want to extract Kotlin AST, and feel really happy to find this library. But I need to get the AST representing statements inside the method, either. I'm struggling with it, because there are no api for extracting Kotlin AST. So I would like to ask you that if I want to implement this by myself, what should I do, where should I start and how difficult it is. It is very important for my research, thanks a lot.

drieks commented 2 years ago

Hi @lululu016, sorry for my late answer. It is possible to implement the missing code here in kotlinx.ast. There are many unit tests and it is easy to add new tests, so it should not be too difficult. If you want to give it a try, please try to checkout this repository and open it in IntelliJ IDEA (or any other IDE) and try to start this test: ´grammar-kotlin-parser-antlr-java/src/test/kotlin/kotlinx/ast/grammar/kotlin/target/antlr/java/test/KotlinGrammarAntlrJavaParserTestDataTest.kt'

This will run all tests found here: grammar-kotlin-parser-test/src/commonMain/resources/testdata*.kt.txt

When this is working, try to change one of files, for example in ´Annotation1.kt.txt´, you can change Annotation1 to AnnotationX. When you rerun the tests, this test should be marked as failed.

Then, create a new file called ´.override-test-data´ in the root folder of the repository (where README.md is located) and rerun the test again. Now, the test will update the data in the unit test folder and will abort the test with the message expected data for test "Annotation1.kt -- raw ast" differs, data updated!. After this, you can see the changes as an outgoing change in ´git diff´ or in your IDE. If you rerun the test again, it will be green, because the expected data and the actual test result will match.

The same applies when you add a new file ´FunctionReturn.kt.txt´ with some kotlin function you want to parse. The first test data should be small, so you can start implementing a simple ast node. Maybe ´´´ fun test(): String { return "test" } ´´´ ​

Run the unittest to let the unittest write its actual data as the expected data. Now you can open the new file ´MyExample.raw.ast.txt´ in your editor to see the raw (unparsed) AST. Here, you can see the return ast node in line 48:

´´´ jumpExpression RETURN >>>return<<< (DEFAULT_TOKEN_CHANNEL) ´´´

The Code to simplify the ast is implemented in the file ´grammar-kotlin-parser-common/src/commonMain/kotlin/kotlinx/ast/grammar/kotlin/common/summary/KotlinTreeMapBuilder.kt´ There, in line 1326 you can find a placeholder for the jumpExpression node:

´´´ // jumpExpression // : THROW NL* expression // | (RETURN | RETURN_AT) expression? // | CONTINUE | CONTINUE_AT // | BREAK | BREAK_AT // ; ´´´

This comment is taken from the antlr grammar for the kotlin language, here after this comment you can add the new code. But before you can start, you have to "fix" the existing converter for "functionBody": ´´´ // functionBody // : block // | ASSIGNMENT NL* expression // ; .convert( filter = byDescription("functionBody") ) { _: Ast -> astDrop() } ´´´

This means 'convert all ast nodes matching the filter "byDescription("functionBody")' using this lambda:


{ _: Ast ->
        astDrop()
    }
´´´

And this lambda is just returning a marker to drop all child nodes. We need to change this into:
´´´
{ node: AstNode ->
        recursiveFlatten(node)
    }
´´´

Now, he will process all child nodes of functionBody (with is a "block" node in this case here) and he will "drop" the functionBody node itself (because we don't need it any longer), this is called "flatten" here, because he is replacing the old functionBody will all new children nodes, like flatMap called on a list.

When you rerun the tests, everything is still green, because nothing was changed in the output. I have no time to write more right now, please let me know if this is helping you, so I can explain more steps for you.
lululu016 commented 2 years ago

Hi, @drieks I'm sooooo glad to receive your reply, and I will try it these days. When I finish, I will let you know as soon as possible. Thanks a lot for your meticulous reply again.

lululu016 commented 2 years ago

Hi, @drieks I have tried all the steps you explained, it works well and help me a lot. I'm looking forward for your further explanation. Thanks soooooo much!

drieks commented 2 years ago

Hi @lululu016, this is great! I will try to write some more examples.

lululu016 commented 2 years ago

Hi, @@drieks

Recently I try to extract statement inside the method, and I would like to ask for help for some quetions I faced and look forwards to some examples from you.

For example, if I want to represent a while statement like this:

fun main(arg: Array<String>) {
    while(num>=5){
            println("Loop: $num")
            num--
        }
}

I want to represent it in ast.summary like this:

PackageHeader()
importList
KlassDeclaration(fun main)
  KlassDeclaration(parameter arg Array<String>)
  Block
    whileStatement
      COND(num>=5)
      controlStructureBody
        statement
          println("Loop: $num")
        statement
          num--

How should I do?I would be really appreciate if you could give me some examples, so that I can complete other statement, like for statement, when statement and so on. Your example would be really help for me.

drieks commented 2 years ago

Hi @lululu016, I think I have currently no time to write a documentation, but a step by step way sounds good. Thank you!

The first step is adding the transformation target. The "raw" AST is always a generic tree structure and therefore not typesafe. Currently, all supported kotlin ast nodes are transformed into instances of ´Klass´ (see Klass.kt line 9). Instances of Klass AST-Nodes can represent any language, it is also possible to add a java or scala parser and a "summary" function to convert this AST also into Klass Nodes.

Because there is currently no support for blocks, a generic statement should be added and maybe a loop statement (like in the kotlin grammar, see KotlinParser.g4 line 323 and 340ff).

´KlassLoopStatement´ should contain a type describing the loop type (for, while...), maybe ´KlassLoopType´.

I created a new branch, feel free to change anything, the code is not tested.

drieks commented 2 years ago

Hi @lululu016 ,

I added a transformation from functionBody to KlassBlock. The Implementation is not finished, currently, only a dummy comment is added as an example:

// functionBody
//     : block
//     | ASSIGNMENT NL* expression
//     ;
    .convert(
        filter = byDescription("functionBody")
    ) { _: Ast ->
        astContinue(
            KlassBlock(
                listOf(
                    KlassComment("TODO", KlassCommentType.Doc)
                )
            )
        )
    }

You can see this comments in the updated unit test files. The next step is to handle "block" and "expression" child nodes of functionBody recursively.

drieks commented 2 years ago

Hi @lululu016 , in commit https://github.com/drieks/ast/commit/fe52ba7f31f26f177cb16d6e325faaf3344598ad I added a new class KlassBlock. The idea is to represent a code block using this class, so every function should contain a KlassBlock. An empty function has an empty KlassBlock, a function containing an assignment (fun x(): Int = 1) contains one entry in the statements list of KlassBlock. Maybe we should add a marker to be able to represent this difference (compared to fun x(); Int {return 1}). The first version contains only one expression, the second form allows a list of statements.

As you can see in line 443 of file KotlinTreeMapBuilder.kt (https://github.com/drieks/ast/commit/fe52ba7f31f26f177cb16d6e325faaf3344598ad#diff-ca031db330093281d46cb9ad9bdfbdacdae6b61a185cfff095e2c8a62bd0a896R443) I changed the converter function of "functionBody" ast nodes from astDrop() (just ignore this "functionBody" node) to astContinue (continue with the tree transformation using the new converted node that was passed to astContinue). I'm passing a new KlassBlock containing a dummy comment, I added the comment only as a showcase. You can see the comment nodes added in the unit test data when you scroll down in the commit link. Every "KlassDeclaration(fun..." contains now a new node KlassComment(Doc).

drieks commented 2 years ago

In the next commit (https://github.com/drieks/ast/commit/7c79d0a0a433ecaf886c9a5ff5e3243d831c2e3d) I added the "real" implementation for the "functionBody" transformation. recursiveFlatten will recursively transform all child nodes of the passed node. recursiveChildren will return the "same" node, but with transformed child nodes. recursiveFlatten will "drop" the passed node, only the transformed child nodes are returned. The return type of both functions is AstResult with two implementations, AstSuccess to indicate that the transformation should continue and AstFailure to abort the transformation using an error message. For example, astDrop() and astContinue() are returning AstSuccess while astError is returning an AstFailure.

AstResult is providing map and flatMap functions, they are working like the ones defined in the standard library. A filter is passed to recursiveFlatten, so only "block" and "expression" nodes are transformed. Nodes not matched by the filter are dropped and not transformed or returned by recursiveFlatten.

The "ASSIGNMENT" and "NL" tokens are not matched by the filter and therefore dropped. See the grammar as a reference, added as a comment before the convert function:

// functionBody
//     : block
//     | ASSIGNMENT NL* expression
//     ;

All KlassComment(Doc) are now removed again from the unit test data, the next step is to implement "block" and "expression" converter functions. Also a unit test for the "expression" case is missing.