preprocessing of texts and source code

pcyin / tranX

A general-purpose neural semantic parser for mapping natural language queries into machine executable code

Apache License 2.0

459 stars 111 forks source link

preprocessing of texts and source code #24

Open liuhuigmail opened 4 years ago

liuhuigmail commented 4 years ago

Great work on source code generation. The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers?

Thanks.

Hui liuhui08@bit.edu.cn

jason-hanling commented 3 years ago

Great work on source code generation. The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers?

Thanks.

Hui liuhui08@bit.edu.cn

have you solved the question ? i am also curious about it

liuhuigmail commented 3 years ago

Great work on source code generation. The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers? Thanks. Hui liuhui08@bit.edu.cn

have you solved the question ? i am also curious about it

No. But I built a new dataset (https://github.com/ds4an/CoDas4CG) and conduct a sequence of preprossing as I will :)

jason-hanling commented 3 years ago

Great work on source code generation. The details of the preprocessing of texts (naturel languges) and source code are missing from the paper. Would you kindly let me known what kind of preprossing has been conducted, e.g., unifying identifiers? Thanks. Hui liuhui08@bit.edu.cn

have you solved the question ? i am also curious about it

No. But I built a new dataset (https://github.com/ds4an/CoDas4CG) and conduct a sequence of preprossing as I will :)

datasets/conala/dataset.py may do the preprocessing, i thought

ShangwenWang commented 2 years ago

@jason-hanling @liuhuigmail

Hi, Professor Liu,

I am also interested about how to pre-process the data. I note that the pre-processing is done by the dataset.py script (you are right). I'd like to know what the files are like before pre-processing (conala-train.json). However, I found that the official webpage of CoNala (https://conala-corpus.github.io/) does not support downloading any more. I wonder do you have something to share. Thanks!

neubig commented 2 years ago

I'm not sure why, but some people have been having trouble downloading the dataset on the chrome browser. Here's a direct link that should work. The dataset is still available: http://www.phontron.com/download/conala-corpus-v1.1.zip

ShangwenWang commented 2 years ago

Oh great @neubig Thanks a lot.