tech-srl / code2vec

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
https://code2vec.org
MIT License
1.1k stars 286 forks source link

Preprocessing on code2seq dataset #3

Closed bentrevett closed 5 years ago

bentrevett commented 5 years ago

Thanks for opening source this, it's great work.

I have been trying to run it on another dataset, specifically the java-small dataset from your code2seq work which I found at https://urialon.cswp.cs.technion.ac.il/publications/.

The issue is that the preprocess.sh script seems to get "stuck" extracting paths from the training set.

I say "stuck" because I'm not actually sure if the script has frozen or it just seems to take a long time (has currently been running for >3 hours).

Do you have an ETA on how long it took from your code2seq experiments? Or will these scripts not work with those datasets?

Thanks in advance.

urialon commented 5 years ago

Hi Ben, Thanks for letting us know!

As we discussed -

  1. Preprocessing is best performed on a multicore server (rather than a laptop), and it can be even further parallelized by running in parallel in several processes, by uncommenting lines 67-68 in https://github.com/tech-srl/code2vec/blob/master/JavaExtractor/extract.py and commenting lines 69-70.
  2. I will preprocess all the three datasets of code2seq using the scripts of code2vec and make them available here. I will let you know when it's done.

Thanks, Uri

urialon commented 5 years ago

Hi Ben, I uploaded the three datasets of code2seq in a preprocessed format. See: https://github.com/tech-srl/code2vec/blob/master/README.md#additional-datasets

Let me know how it goes. Uri

DungNguyen83 commented 5 years ago

Hello @urialon for example, I have my own dataset, how can I convert them into the c2v format so that I can feed in code2vec system

Thanks you a lot

urialon commented 5 years ago

Hi @dungqut, Please open a new issue. Don't worry, everything will be solved.

DungNguyen83 commented 5 years ago

@urialon I have opened new isse