wasiahmad / PLBART

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].
https://arxiv.org/abs/2103.06333
MIT License
186 stars 35 forks source link

Dataset not found when execute `bash run.sh` on code-to-text task #21

Closed ztw33 closed 3 years ago

ztw33 commented 3 years ago

Hi, I want to train code-to-text model using pretrained model. I download the file code-to-text.zip, after I execute bash prepare.sh, the files in data/codeXglue/code-to-text/java are listed as below:

.
├── data-bin
│   ├── dict.en_XX.txt
│   ├── dict.java.txt
│   └── preprocess.log
├── test.jsonl
├── test.spm.en_XX
├── test.spm.java
├── train.jsonl
├── train.spm.en_XX
├── train.spm.java
├── valid.jsonl
├── valid.spm.en_XX
└── valid.spm.java

When I execute bash run.sh 0 java, I got FileNotFoundError: Dataset not found: valid (/home/.../PLBART/data/codeXglue/code-to-text/java/data-bin) and FileNotFoundError: Dataset not found: test (/home/.../PLBART/data/codeXglue/code-to-text/java/data-bin) error. May I please ask whether I forget some steps? Which files are supposed to be in data/codeXglue/code-to-text/java? Sorry if I bother, I appreciate your help!

wasiahmad commented 3 years ago

You need to run the prepare.sh script. Since binarization has not been done, there is no data-bin folder.

wasiahmad commented 3 years ago

The java directory should look as follows.

.
  |-train.spm.java
  |-valid.jsonl
  |-train.spm.en_XX
  |-test.spm.java
  |-test.spm.en_XX
  |-valid.spm.java
  |-valid.spm.en_XX
  |-test.jsonl
  |-train.jsonl
  |-data-bin
  |  |-train.java-en_XX.java.idx
  |  |-valid.java-en_XX.en_XX.bin
  |  |-preprocess.log
  |  |-train.java-en_XX.en_XX.bin
  |  |-valid.java-en_XX.java.idx
  |  |-valid.java-en_XX.en_XX.idx
  |  |-dict.java.txt
  |  |-train.java-en_XX.en_XX.idx
  |  |-test.java-en_XX.en_XX.idx
  |  |-train.java-en_XX.java.bin
  |  |-test.java-en_XX.en_XX.bin
  |  |-dict.en_XX.txt
  |  |-test.java-en_XX.java.idx
  |  |-valid.java-en_XX.java.bin
  |  |-test.java-en_XX.java.bin

In the prepare.sh script, this function should be executed. Please delete the data-bin directory and re-run the prepare.sh script.

ztw33 commented 3 years ago

The java directory should look as follows.

.
  |-train.spm.java
  |-valid.jsonl
  |-train.spm.en_XX
  |-test.spm.java
  |-test.spm.en_XX
  |-valid.spm.java
  |-valid.spm.en_XX
  |-test.jsonl
  |-train.jsonl
  |-data-bin
  |  |-train.java-en_XX.java.idx
  |  |-valid.java-en_XX.en_XX.bin
  |  |-preprocess.log
  |  |-train.java-en_XX.en_XX.bin
  |  |-valid.java-en_XX.java.idx
  |  |-valid.java-en_XX.en_XX.idx
  |  |-dict.java.txt
  |  |-train.java-en_XX.en_XX.idx
  |  |-test.java-en_XX.en_XX.idx
  |  |-train.java-en_XX.java.bin
  |  |-test.java-en_XX.en_XX.bin
  |  |-dict.en_XX.txt
  |  |-test.java-en_XX.java.idx
  |  |-valid.java-en_XX.java.bin
  |  |-test.java-en_XX.java.bin

In the prepare.sh script, this function should be executed. Please delete the data-bin directory and re-run the prepare.sh script.

Thanks a lot! I have successfully run the training scripts. Thanks again for your patience!