Closed cronopioelectronico closed 3 years ago
Thank you for asking! For convenient, we download the cnn/dm
dataset using the Tensorflow/tensor2tensor. Then please try out the commands below to prepare the binary dataset.
#!/bin/bash
TEXT=data/cnn_daily_t2t
TRUNC=1000
fairseq-preprocess --source-lang source --target-lang target \
--trainpref $TEXT/cnndm.train.$TRUNC --validpref $TEXT/cnndm.dev.$TRUNC --testpref $TEXT/cnndm.test.$TRUNC \
--destdir data/binary/cnndm_t2t_30k_$TRUNC \
--workers 20 --joined-dictionary
Hi, in the README file there are instructions to prepare the other datasets, but they are missing for the CNN / DailyMail dataset. Since you are providing the checkpoint for this case, It would be great if you can include the data preparation instructions too. Thanks.