tech-srl / code2vec

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
https://code2vec.org
MIT License
1.1k stars 286 forks source link

Loading tokens file for tokenizer #105

Closed faysalhossain2007 closed 3 years ago

faysalhossain2007 commented 3 years ago

Did anyone use the tokens generated from Tokens.txt file and use it to load tokenizer?

I was able to load the word2vec model using the code snippet shown in the link. But when it comes to initalize the tokenizer, I am struggling a bit. My approach is:

    token = Tokenizer()
    token.fit_on_texts(embedding_layer_text)

embedding_layer_text is the list of all the code data.

But I am wondering whether we can directly load the tokens generated from code2vec by following the approach mentioned in the link?