tech-srl / code2vec

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
https://code2vec.org
MIT License
1.11k stars 286 forks source link

Regd. training dataset/token vectors for C# (Code2vec) #91

Closed shreyasingh closed 4 years ago

shreyasingh commented 4 years ago

Hi, I'm working on a Neural Code Search prototype and am currently in the process of doing literature survey on code embeddings. I came across your code2vec paper and I must say that it is a very well-written paper and I enjoyed reading it!

I was going through the github code for code2vec (https://github.com/tech-srl/code2vec) and wanted to train the model for C# files. Could you let me know if:

There is an existing dataset for C# source code files - the same way as you've published for Java on Github. The dataset could be unprocessed, and I could process it with preprocess_csharp.sh. I also saw that you've published the token and method name embeddings which are available to download. Is there a similar token and method name embedding file available for C# language tokens that I could directly load and use? That way I will not need to train the model. Your advice and guidance on the above two items would be highly appreciated.

urialon commented 4 years ago

Hi @shreyasingh , thank you for your interest in code2vec!

Best, Uri

urialon commented 4 years ago

Hi @shreyasingh, I'm closing this due to inactivity, feel free to re-open, or open another issue, if you have any further questions.

Good luck with your internship! Uri