Closed chao276951044 closed 4 years ago
Hi, Thank you for your interest in code2seq!
For the format see a description here: https://github.com/tech-srl/code2seq/blob/master/README.md#extending-to-other-languages
Alternatively, you can simply run preprocessing on a small directory (e.g., you can run preprocessing on the JavaExtractor code itself) and see the format.
The TRAIN_DIR is the source of the data, not the target. The data is generated at "data/${dataset_name}/" dir. See also the comment on the top of "preprocess.sh".
Best, Uri
As we can see in preprocess.sh: in line 36 TRAIN_DATA_FILE=${DATASET_NAME}.train.raw.txt VAL_DATA_FILE=${DATASET_NAME}.val.raw.txt TEST_DATA_FILE=${DATASET_NAME}.test.raw.txt
Can you tell me the format of train.raw.txt?
And after processing, the dataset is generated in TRAIN_DIR=my_training_dir?