tech-srl / code2vec

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
https://code2vec.org
MIT License
1.1k stars 286 forks source link

Extract path-contexts iteratively #97

Closed celsofranssa closed 3 years ago

celsofranssa commented 4 years ago

I am working with a Java dataset composed of pairs (code, comment), as shown below:


id | code | comment
-- | -- | --
321 | \tpublic int getPushesLowerbound() {\n\t\tretu... | returns the pushes lowerbound of this board po...
323 | \tpublic void setPushesLowerbound(int pushesLo... | sets the pushes lowerbound of this board position
324 | \t\tpublic void play() {\n\t\t\t\n\t\t\t// If ... | play a sound
343 | \tpublic int getInfluenceValue(int boxNo1, int... | returns the influence value between the positi...
351 | \tpublic void setPositions(int[] positions){\n... | sets the box positions and the player position

then, is there an approach to extract the path context of each Java method creating new pairs (path_context, comment)?

urialon commented 4 years ago

Hi @Ceceu , Thanks again for your interest in code2vec! I think that code2seq would be more appropriate for this task than code2vec.

Please see these issues: https://github.com/tech-srl/code2seq/issues/41 https://github.com/tech-srl/code2seq/issues/45

Best, Uri

celsofranssa commented 4 years ago

Couldn't the following script

python3 code2vec.py \
    --load models/java14_model/saved_model_iter8.release \
    --test codes.txt \
    --export_code_vectors

be used to extract the vector from the codes?

urialon commented 3 years ago

Hmmm, not exactly, the codes.txt file needs to be a file that was preprocessed by JavaExtractor. The --test flag expects a preprocessed file (where every row is a list of paths), rather than a raw Java text.

faysalhossain2007 commented 3 years ago

If we want to build C/C++ vector using code2vec, then what should we use? - 1) JavaExtractor, 2) CSharp Extractor, 3) I need to build my own extractor?

urialon commented 3 years ago

Hi @faysalhossain2007 , Thank you for your interest in code2vec!

You'll need to build your own extractor. Fortunately, there are some existing extractors for C/C++, see: https://github.com/tech-srl/code2vec#extending-to-other-languages and: https://github.com/tech-srl/code2seq/#extending-to-other-languages

If you have any further questions, feel free to open a new issue, as these issues are unrelated.

Best, Uri

celsofranssa commented 3 years ago

Hmmm, not exactly, the codes.txt file needs to be a file that was preprocessed by JavaExtractor. The --test flag expects a preprocessed file (where every row is a list of paths), rather than a raw Java text.

@urialon, thank you.