microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.12k stars 206 forks source link

MASS-supNMT Index file doesn't match expected format. What could cause this? #99

Open jtynkkynen opened 4 years ago

jtynkkynen commented 4 years ago

I am trying to follow the MASS-supNMT instructions and preparing the material, which seems to succeed. but when I try to pretrain it I get the assertion error: AssertionError: Index file doesn't match expected format. Make sure that --dataset-impl is configured properly. I would expect the index be the default as I did not set it. Any ideas on this?

StillKeepTry commented 4 years ago

It seems like an update of fairseq, currently, the default data style of fairseq is mmap. I will update the dataloader later according to the latest fairseq.

StillKeepTry commented 4 years ago

Which version of fairseq do you use?

jtynkkynen commented 4 years ago

I am using 0.8.0 version.

StillKeepTry commented 4 years ago

@jtynkkynen would you like to add --dataset-impl lazy during the data generation.

jtynkkynen commented 4 years ago

I can do that. Will that solve the issue?

StillKeepTry commented 4 years ago

@jtynkkynen , You can have a try. I also meet the same error at the latest version of fairseq. change --dataset-impl as lazy can solve my problem.