tech-srl / code2seq

Code for the model presented in the paper: "code2seq: Generating Sequences from Structured Representations of Code"
http://code2seq.org
MIT License
548 stars 165 forks source link

how did you generate the AST tree structure through code #82

Closed walt676 closed 3 years ago

walt676 commented 3 years ago

Hello, how did you generate the AST tree structure through code, for example, as shown on the right side of the page https://code2seq.org/ My English is not very good, I am very sorry if I offend. Looking forward to your reply, thank you!

urialon commented 3 years ago

Hi @walt676 , Thank you for your interest in our work!

Do you refer to the JavaScript visualization? It was built with React, D3, and react-d3-tree with some domain-specific logic for our visualization.

I hope it answers your question. Best, Uri

walt676 commented 3 years ago

Hi, @urialon , Thank you very much for your reply. Actually I am not interested in the visualization. I would like to know the process of converting the source code into AST. In the part I mentioned above,do you use JavaParser, and finally visualize it? Thank you again for responding to my questions while busy.

urialon commented 3 years ago

Yes, We use the JavaParser package in our JavaExtractor to parse the source code into AST. We use the AST mainly for training the neural model, but also for visualizing.

walt676 commented 3 years ago

Thank you very much for your reply, @urialon . I want to ask you some implementation details. I know that in your work, you use the AST path as input and store the path in the dataset. If I want to directly store the tree structure of AST in the dataset persistently, do you have any suggestions on what format to store.

urialon commented 3 years ago

Hi @walt676 , Sorry for the delayed response.

I think that if you upgrade the version of Javaparser from the pom.xml file of the JavaExtractor project, you'll be able to use the built-in serializer of Javaparser: https://javadoc.io/doc/com.github.javaparser/javaparser-core-serialization/latest/index.html

Best, Uri

cynthia0118 commented 3 years ago

Yes, We use the JavaParser package in our JavaExtractor to parse the source code into AST. We use the AST mainly for training the neural model, but also for visualizing.

Hi, I wonder that whether you use the package "Microsoft.CodeAnalysis.CSharp.Syntax" to parse a C# code to get an AST tree. Thank you for your reply.

urialon commented 3 years ago

Hi @cynthia0118 , Thank you for your interest in our work.

Yes, in C#, we use this package, as you can see here: https://github.com/tech-srl/code2seq/blob/master/CSharpExtractor/CSharpExtractor/Extractor/Extractor.cs#L4

Best, Uri

cynthia0118 commented 3 years ago

Thank you very much!Best wishes!

发自我的iPhone

------------------ Original ------------------ From: Uri Alon <notifications@github.com> Date: Sat,Jan 23,2021 3:04 PM To: tech-srl/code2seq <code2seq@noreply.github.com> Cc: cynthia0118 <786647625@qq.com>, Mention <mention@noreply.github.com> Subject: Re: [tech-srl/code2seq] how did you generate the AST tree structure through code (#82)

Hi @cynthia0118 , Thank you for your interest in our work.

Yes, in C#, we use this package, as you can see here: https://github.com/tech-srl/code2seq/blob/master/CSharpExtractor/CSharpExtractor/Extractor/Extractor.cs#L4

Best, Uri

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

walt676 commented 3 years ago

Hi @walt676 , Sorry for the delayed response.

I think that if you upgrade the version of Javaparser from the pom.xml file of the JavaExtractor project, you'll be able to use the built-in serializer of Javaparser: https://javadoc.io/doc/com.github.javaparser/javaparser-core-serialization/latest/index.html

Best, Uri

Thank you very much for answering my question. My problem has been resolved and this issue will be closed temporarily. Thank you again, Uri!

walt676 commented 3 years ago

Hi @urialon , Sorry to bother you again, I have a definition that I don’t quite understand: In JavaExtractor which you used to generate AST path, what is 'Generic Parent'? And I see you show AST of a code snippet in 'code2seq.org', could you share the code you used to generate AST? Thank you for checking this issue.

urialon commented 3 years ago

Hi @walt676 , The "Generic parent" is simply a node that represents a generic type, like ArrayList<String>.

What do you mean by "generate AST"? In the JavaParser, we parse the input code and we get an object that holds the AST here: https://github.com/tech-srl/code2seq/blob/master/JavaExtractor/JPredict/src/main/java/JavaExtractor/FeatureExtractor.java#L68

Are you interested in this object, or in serializing it as JSON? If so, did you check my previous response about the built-in serializer of Javaparser?https://javadoc.io/doc/com.github.javaparser/javaparser-core-serialization/latest/index.html

Uri

walt676 commented 3 years ago

Hi @urialon , I modified your code according to the required data format,but some errors occurred during use, and this problem also occurred when running your original version of the code. When I used bash preprocess.sh and only generated raw data(xxxx.raw.txt), these error messages will appear:

[root@localhost code2seq-master]# bash preprocess.sh 
Extracting paths from validation set...
Finished extracting paths from validation set
Extracting paths from test set...
Finished extracting paths from test set
Extracting paths from training set...
dir: /opt/project/code2seq-master/java-small/training/cassandra was not completed in time
dir: /opt/project/code2seq-master/java-small/training/intellij-community was not completed in time
dir: /opt/project/code2seq-master/java-small/training/liferay-portal was not completed in time
dir: /opt/project/code2seq-master/java-small/training/hibernate-orm was not completed in time
dir: /opt/project/code2seq-master/java-small/training/wildfly was not completed in time
dir: /opt/project/code2seq-master/java-small/training/elasticsearch was not completed in time
b'java.util.concurrent.ExecutionException: com.github.javaparser.ParseProblemException: Encountered unexpected token: "{" "{"\n    at line 1, column 53.\n\nWas expecting one of:\n\n    "@"\n    <IDENTIFIER>\n\n\n\tat java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)\n\tat java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)\n\tat JavaExtractor.App.lambda$extractDir$3(App.java:59)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1541)\n\tat JavaExtractor.App.extractDir(App.java:57)\n\tat JavaExtractor.App.main(App.java:32)\nCaused by: com.github.javaparser.ParseProblemException: Encountered unexpected token: "{" "{"\n    at line 1, column 53.\n\nWas expecting one of:\n\n    "@"\n    <IDENTIFIER>\n\n\n\tat com.github.javaparser.JavaParser.simplifiedParse(JavaParser.java:242)\n\tat com.github.javaparser.JavaParser.parse(JavaParser.java:210)\n\tat JavaExtractor.FeatureExtractor.parseFileWithRetries(FeatureExtractor.java:66)\n\tat JavaExtractor.FeatureExtractor.extractFeatures(FeatureExtractor.java:38)\n\tat JavaExtractor.ExtractFeaturesTask.extractSingleFile(ExtractFeaturesTask.java:64)\n\tat JavaExtractor.ExtractFeaturesTask.processFile(ExtractFeaturesTask.java:35)\n\tat JavaExtractor.ExtractFeaturesTask.call(ExtractFeaturesTask.java:28)\n\tat JavaExtractor.ExtractFeaturesTask.call(ExtractFeaturesTask.java:17)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\njava.util.concurrent.ExecutionException: com.github.javaparser.ParseProblemException: Encountered unexpected token: "package" "package"\n    at line 1, column 20.\n\nWas expecting one of:\n\n    ";"\n    "<"\n    "@"\n    "abstract"\n    "boolean"\n    "byte"\n    "char"\n    "class"\n    "default"\n    "double"\n    "enum"\n    "final"\n    "float"\n    "int"\n    "interface"\n    "long"\n    "native"\n    "private"\n    "protected"\n    "public"\n    "short"\n    "static"\n    "strictfp"\n    "synchronized"\n    "transient"\n    "void"\n    "volatile"\n    "{"\n    "}"\n    <IDENTIFIER>\n\n\n\tat java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)\n\tat java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)\n\tat JavaExtractor.App.lambda$extractDir$3(App.java:59)\n\tat java.base/java.util.ArrayList.forEach(ArrayList.java:1541)\n\tat JavaExtractor.App.extractDir(App.java:57)\n\tat JavaExtractor.App.main(App.java:32)\nCaused by: com.github.javaparser.ParseProblemException: Encountered unexpected token: "package" "package"\n    at line 1, column 20.\n\nWas expecting one of:\n\n    ";"\n    "<"\n    "@"\n    "abstract"\n    "boolean"\n    "byte"\n    "char"\n    "class"\n    "default"\n    "double"\n    "enum"\n    "final"\n    "float"\n    "int"\n    "interface"\n    "long"\n    "native"\n    "private"\n    "protected"\n    "public"\n    "short"\n    "static"\n    "strictfp"\n    "synchronized"\n    "transient"\n    "void"\n    "volatile"\n    "{"\n    "}"\n    <IDENTIFIER>\n\n\n\tat com.github.javaparser.JavaParser.simplifiedParse(JavaParser.java:242)\n\tat com.github.javaparser.JavaParser.parse(JavaParser.java:210)\n\tat JavaExtractor.FeatureExtractor.parseFileWithRetries(FeatureExtractor.java:66)\n\tat JavaExtractor.FeatureExtractor.extractFeatures(FeatureExtractor.java:38)\n\tat JavaExtractor.ExtractFeaturesTask.extractSingleFile(ExtractFeaturesTask.java:64)\n\tat JavaExtractor.ExtractFeaturesTask.processFile(ExtractFeaturesTask.java:35)\n\tat JavaExtractor.ExtractFeaturesTask.call(ExtractFeaturesTask.java:28)\n\tat JavaExtractor.ExtractFeaturesTask.call(ExtractFeaturesTask.java:17)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n'
Finished extracting paths from training set

And when I use java -Xmx100g -XX:MaxNewSize=60g -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --dir /opt/project/code2seq-master/java-small/training/gradle --max_path_length 8 --max_path_width 2 --num_threads 6 would report error after printing some successfully generated data:

java.util.concurrent.ExecutionException: com.github.javaparser.ParseProblemException: Encountered unexpected token: "{" "{"
    at line 1, column 53.

Was expecting one of:

    "@"
    <IDENTIFIER>

    at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
    at JavaExtractor.App.lambda$extractDir$3(App.java:59)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
    at JavaExtractor.App.extractDir(App.java:57)
    at JavaExtractor.App.main(App.java:32)
Caused by: com.github.javaparser.ParseProblemException: Encountered unexpected token: "{" "{"
    at line 1, column 53.

Was expecting one of:

    "@"
    <IDENTIFIER>

    at com.github.javaparser.JavaParser.simplifiedParse(JavaParser.java:242)
    at com.github.javaparser.JavaParser.parse(JavaParser.java:210)
    at JavaExtractor.FeatureExtractor.parseFileWithRetries(FeatureExtractor.java:77)
    at JavaExtractor.FeatureExtractor.extractFeatures(FeatureExtractor.java:49)
    at JavaExtractor.ExtractFeaturesTask.extractSingleFile(ExtractFeaturesTask.java:65)
    at JavaExtractor.ExtractFeaturesTask.processFile(ExtractFeaturesTask.java:35)
    at JavaExtractor.ExtractFeaturesTask.call(ExtractFeaturesTask.java:28)
    at JavaExtractor.ExtractFeaturesTask.call(ExtractFeaturesTask.java:17)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
java.util.concurrent.ExecutionException: com.github.javaparser.ParseProblemException: Encountered unexpected token: "package" "package"
    at line 1, column 20.

Was expecting one of:

    ";"
    "<"
    "@"
    "abstract"
    "boolean"
    "byte"
    "char"
    "class"
    "default"
    "double"
    "enum"
    "final"
    "float"
    "int"
    "interface"
    "long"
    "native"
    "private"
    "protected"
    "public"
    "short"
    "static"
    "strictfp"
    "synchronized"
    "transient"
    "void"
    "volatile"
    "{"
    "}"
    <IDENTIFIER>

    at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
    at JavaExtractor.App.lambda$extractDir$3(App.java:59)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
    at JavaExtractor.App.extractDir(App.java:57)
    at JavaExtractor.App.main(App.java:32)
Caused by: com.github.javaparser.ParseProblemException: Encountered unexpected token: "package" "package"
    at line 1, column 20.

Was expecting one of:

    ";"
    "<"
    "@"
    "abstract"
    "boolean"
    "byte"
    "char"
    "class"
    "default"
    "double"
    "enum"
    "final"
    "float"
    "int"
    "interface"
    "long"
    "native"
    "private"
    "protected"
    "public"
    "short"
    "static"
    "strictfp"
    "synchronized"
    "transient"
    "void"
    "volatile"
    "{"
    "}"
    <IDENTIFIER>

    at com.github.javaparser.JavaParser.simplifiedParse(JavaParser.java:242)
    at com.github.javaparser.JavaParser.parse(JavaParser.java:210)
    at JavaExtractor.FeatureExtractor.parseFileWithRetries(FeatureExtractor.java:77)
    at JavaExtractor.FeatureExtractor.extractFeatures(FeatureExtractor.java:49)
    at JavaExtractor.ExtractFeaturesTask.extractSingleFile(ExtractFeaturesTask.java:65)
    at JavaExtractor.ExtractFeaturesTask.processFile(ExtractFeaturesTask.java:35)
    at JavaExtractor.ExtractFeaturesTask.call(ExtractFeaturesTask.java:28)
    at JavaExtractor.ExtractFeaturesTask.call(ExtractFeaturesTask.java:17)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)

I had increased timeout here : https://github.com/tech-srl/code2seq/blob/master/JavaExtractor/extract.py#L37 But it didn't solve the problem. Sorry to interrupt you, and look forward to your reply. Thank you !

urialon commented 3 years ago

Hi, It looks like the problem is a ParseProblemException, which means that the file (the data file in "gradle") does not parse, there is some syntactic problem.

You can try to print what is the data input file that does not parse, to verify that it is indeed has a parsing problem. Uri

On Fri, Mar 12, 2021 at 9:40 AM zxZhang @.***> wrote:

Reopened #82 https://github.com/tech-srl/code2seq/issues/82.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tech-srl/code2seq/issues/82#event-4449206725, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMDGBZYZQGA37XRBAJTTDHANPANCNFSM4VUZ4FDA .

walt676 commented 3 years ago

Hi @urialon , I also think this is the reason. I used the same dataset and JavaParser version as you. Did this error occur when you are processing this file? I will ignore these problematic java files and deal with other normal ones, Thank you!

urialon commented 3 years ago

Hi, Yes, in my preprocessing some files raised a parsing error which I ignored, yes. You can verify that the final number of lines in the file is (roughly) as expected.

Best, Uri

walt676 commented 3 years ago

Thank you for your detailed reply!