Open liuhuigmail opened 4 years ago
Great work on source code generation. The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers?
Thanks.
have you solved the question ? i am also curious about it
Great work on source code generation. The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers? Thanks. Hui liuhui08@bit.edu.cn
have you solved the question ? i am also curious about it
No. But I built a new dataset (https://github.com/ds4an/CoDas4CG) and conduct a sequence of preprossing as I will :)
Great work on source code generation. The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers? Thanks. Hui liuhui08@bit.edu.cn
have you solved the question ? i am also curious about it
No. But I built a new dataset (https://github.com/ds4an/CoDas4CG) and conduct a sequence of preprossing as I will :)
datasets/conala/dataset.py may do the preprocessing, i thought
@jason-hanling @liuhuigmail
Hi, Professor Liu,
I am also interested about how to pre-process the data. I note that the pre-processing is done by the dataset.py script (you are right). I'd like to know what the files are like before pre-processing (conala-train.json). However, I found that the official webpage of CoNala (https://conala-corpus.github.io/) does not support downloading any more. I wonder do you have something to share. Thanks!
I'm not sure why, but some people have been having trouble downloading the dataset on the chrome browser. Here's a direct link that should work. The dataset is still available: http://www.phontron.com/download/conala-corpus-v1.1.zip
Oh great @neubig Thanks a lot.
Great work on source code generation. The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers?
Thanks.
Hui liuhui08@bit.edu.cn