tech-srl / code2vec

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
https://code2vec.org
MIT License
1.1k stars 286 forks source link

Method prediction #90

Closed crimsonfan closed 4 years ago

crimsonfan commented 4 years ago

First of all, I would like to thanks all of your works. I'm a college student and I'm studying, doing research about this project. I would like to ask that: when code2vec predicts a snippet of code, it also gives us the vector of that snipped, after the method prediction (99% [solve] for example), so if I want to calculate the distance between the snippet's vector and the method prediction's vectors ([solve] vector), can I find those prediction's vectors from this project and where can I find these hardcodes location?

Many thanks.

crimsonfan commented 4 years ago

Dear author, I read the solved issues and found they are located in models/java14_model/target.txt, am I right? P/s: If in a class has many function, can the code2vec shows the predict method name of each function along with its line number (in case functions have the same name)? Thanks.

urialon commented 4 years ago

Hi @crimsonfan , Thank you for your interest in code2vec!

Yes, you can take the code vector and compute its distance with the target embedding. Did you find the target.txt file? See also https://github.com/tech-srl/code2vec#exporting-the-trained-token-vectors-and-target-vectors

Alternatively, you can perform this computation (code-target distance) in runtime, and run a --test run on the test set that will output the distances for the entire test set.

crimsonfan commented 4 years ago

Dear @urialon,

Thanks, I've almost done that part. As i read, the code2vec analyzes functions in a class one by one and transfer them to vectors (4 functions to 4 vectors). Is there any way for me to find the starting line of each function for the case that all of them have the same names?

Thanks, Tin.

urialon commented 4 years ago

Currently, this is not supported. Maybe the easiest way would be to append the line number to each method name, during preprocessing (during the run of the JavaExtractor). For example, toString can become toString_123.

Best, Uri

crimsonfan commented 4 years ago

Dear @urialon , Thanks for all. Tin.