tech-srl / code2vec

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
https://code2vec.org
MIT License
1.11k stars 286 forks source link

How to get most similar words by meaning while the model output is a bag-of-path? #113

Closed BTVinh0409 closed 3 years ago

urialon commented 3 years ago

Hi @btvinh0409aduvjp , I am not sure I understand your question. Did you see this section? https://github.com/tech-srl/code2vec#exporting-the-trained-token-vectors-and-target-vectors

Here's an example (in python) from the README:

>>> from gensim.models import KeyedVectors as word2vec
>>> vectors_text_path = 'models/java14_model/targets.txt' # or: `models/java14_model/tokens.txt'
>>> model = word2vec.load_word2vec_format(vectors_text_path, binary=False)
>>> model.most_similar(positive=['equals', 'to|lower']) # or: 'tolower', if using the downloaded embeddings
>>> model.most_similar(positive=['download', 'send'], negative=['receive'])

Best, Uri