tech-srl / code2vec

TensorFlow code for the neural network presented in the paper: "code2vec: Learning Distributed Representations of Code"
https://code2vec.org
MIT License
1.1k stars 286 forks source link

update extract.py #182

Closed lidiancracy closed 1 year ago

lidiancracy commented 1 year ago

When I used process.sh to extract project data, I found that the project was too large to be extracted. As a result, I modified extract.py to read the dataset paths in batches. Moreover, some datasets might have errors that prevent them from being parsed (it doesn't throw an error but just hangs, which was quite perplexing). Therefore, I added a time constraint, and if it exceeds a certain duration without processing, it skips. I hope this can assist users dealing with large volumes of data.

urialon commented 1 year ago

Great, thank you @lidiancracy !