tech-srl / code2vec

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
https://code2vec.org
MIT License
1.1k stars 286 forks source link

Generating embeddings vectors for source code files #132

Closed Avv22 closed 3 years ago

Avv22 commented 3 years ago

Hello,

Thanks for this work. I am trying to get embeddings for a 100 source code files of Java similar to how we use word2vec to get embeddings for document. So I would like please each source file to be represented by let us say 100 embedding vector, so for 100 source code file, we should have 100x100 embeddings. How please to do that with your model?

Thanks.

urialon commented 3 years ago

Hi Avra, Thank you for your interest in this work! Sorry for the delayed response.

Have you seen this section of the README? https://github.com/tech-srl/code2vec#exporting-the-code-vectors-for-the-given-code-examples

Avv22 commented 3 years ago

@urialon. Thank you. So this has to be done either once the model is trained or in case I used a trained model where I can just do prediction?

urialon commented 3 years ago

Correct, the model needs to be trained. Otherwise, vectors are meaningless.

Avv22 commented 3 years ago

Correct, the model needs to be trained. Otherwise, vectors are meaningless.

Thank you. we have 20k of Java source codes stored in (.java) format. So we would like to produce embeddings one at a time for all 20k files. So for each file ordered (order is important) we are looking please for one embedding, can you please give direction how to do that with your trained model as I guess you have already published a trained model for Java, so no need to pretrain the model again?

urialon commented 3 years ago

Right, you do not need to retrain the model.

Have you seen this section of the README? https://github.com/tech-srl/code2vec#exporting-the-code-vectors-for-the-given-code-examples

Avv22 commented 3 years ago

Right, you do not need to retrain the model.

Have you seen this section of the README? https://github.com/tech-srl/code2vec#exporting-the-code-vectors-for-the-given-code-examples

Thank you. I run the train.sh with 2 datasets train.c2s and test.c2s, but I got the following error during training:

FileNotFoundError: [Errno 2] No such file or directory: 'data/name.dict.c2v'

urialon commented 3 years ago

This file is created during preprocessing. Did you run preprocessing?

On Fri, Nov 5, 2021 at 22:30 Avra @.***> wrote:

Right, you do not need to retrain the model.

Have you seen this section of the README? https://github.com/tech-srl/code2vec#exporting-the-code-vectors-for-the-given-code-examples

Thank you. I run the train.sh with 2 datasets train.c2s and test.c2s, but I got the following error during training:

FileNotFoundError: [Errno 2] No such file or directory: 'data/name.dict.c2v'

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/tech-srl/code2vec/issues/132#issuecomment-962377398, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMHW4WJNWMBR2757EB3UKSONZANCNFSM5F362BNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Avv22 commented 3 years ago

This file is created during preprocessing. Did you run preprocessing? On Fri, Nov 5, 2021 at 22:30 Avra @.***> wrote: Right, you do not need to retrain the model. Have you seen this section of the README? https://github.com/tech-srl/code2vec#exporting-the-code-vectors-for-the-given-code-examples Thank you. I run the train.sh with 2 datasets train.c2s and test.c2s, but I got the following error during training: FileNotFoundError: [Errno 2] No such file or directory: 'data/name.dict.c2v' — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#132 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMHW4WJNWMBR2757EB3UKSONZANCNFSM5F362BNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

Hello Doctor,

We did not get that on Python preprocessor via astminer tool. We opened issue here, hopefully you help us with that. On Java, we run the trained model, but as you told us before, we have to use astminer tool to extract AST paths, we did that but the tool does not produce dict.c2v. Please have a look at our issue above..

urialon commented 3 years ago

OK, so I'm closing this issue and will answer at https://github.com/tech-srl/code2vec/issues/137