tech-srl / code2vec

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
https://code2vec.org
MIT License
1.11k stars 286 forks source link

Collecting the context from Coding semantics #100

Closed faysalhossain2007 closed 3 years ago

faysalhossain2007 commented 4 years ago

Q1. We want to collect the code context. For example,

def func1(lst, option):
   if option == 1:
           sortedList = sorted(lst)
           print(sortedList)
           return sortedList
   else:
          reverseList = reversed(lst)
          print(reverseList)
           return reverseList

one part of the function provide us the sorted list while other one reversed list. Now if I want to capture this context, does the following approach seem reasonable? use the vector generated by code2vec, use my own lstm-model trained with labeled data, evaluate it on the test data?

Q2: As my dataset contains more than one programming language, do you have any suggestion on the best way to combine embedding vectors?

Thanks for the help! I appreciate for making your tool publicly available.

urialon commented 3 years ago

Hi @faysalhossain2007 , Sorry for the delayed response.

Q1: If I understand your question correctly, each of code2vec and an LSTM can capture the context on their own.

Q2: That's a good question that unfortunately, I don't have an answer for. One possible way would be to write a new Extractor (according to the format here) and use an AST format the fits multiple languages, like the Github Semantic .

Best, Uri

urialon commented 3 years ago

Closing due to inactivity, but feel free to re-open if you have additional questions.