microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.12k stars 206 forks source link

summarization - pretrianing crash #81

Open rabeehk opened 5 years ago

rabeehk commented 5 years ago

Hi, I ran the codes for pretraining of summarization, I got the following error, and really appreciate your help. I am working for a deadline and would greatly appreciate your prompt response

| model transformer_mass_base, criterion MaskedLmLoss | num. model params: 123469824 (num. trained: 123469824) | training on 1 GPUs | max tokens per GPU = None and max sentences per GPU = 8 | no existing checkpoint found checkpoints/checkpoint_last.pt | loading train data for epoch 0 Traceback (most recent call last): File "/idiap/user/rkarimi/libs/anaconda3/envs/transformer36/bin/fairseq-train", line 10, in sys.exit(cli_main()) File "/idiap/user/rkarimi/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/fairseq_cli/train.py", line 321, in cli_main main(args) File "/idiap/user/rkarimi/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/fairseq_cli/train.py", line 68, in main extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer) File "/idiap/user/rkarimi/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/fairseq/checkpoint_utils.py", line 126, in load_checkpoint epoch_itr = trainer.get_train_iterator(epoch=0) File "/idiap/user/rkarimi/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/fairseq/trainer.py", line 201, in get_train_iterator self.task.load_dataset(self.args.train_subset, epoch=epoch, combine=combine) File "/remote/idiap.svm/user.active/rkarimi/dev/MASS/MASS-summarization/mass/masked_s2s.py", line 121, in load_dataset combine=combine, File "/idiap/user/rkarimi/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/fairseq/data/data_utils.py", line 75, in load_indexed_dataset dictionary=dictionary, File "/idiap/user/rkarimi/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/fairseq/data/indexed_dataset.py", line 60, in make_dataset return MMapIndexedDataset(path) File "/idiap/user/rkarimi/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/fairseq/data/indexed_dataset.py", line 448, in init self._do_init(path) File "/idiap/user/rkarimi/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/fairseq/data/indexed_dataset.py", line 461, in _do_init self._bin_buffer_mmap = np.memmap(data_file_path(self._path), mode='r', order='C') File "/idiap/user/rkarimi/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/numpy/core/memmap.py", line 264, in new mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start) ValueError: cannot mmap an empty file Exception ignored in: <bound method MMapIndexedDataset.del of <fairseq.data.indexed_dataset.MMapIndexedDataset object at 0x7f3a1010e320>> Traceback (most recent call last): File "/idiap/user/rkarimi/libs/anaconda3/envs/transformer36/lib/python3.6/site-packages/fairseq/data/indexed_dataset.py", line 465, in del self._bin_buffer_mmap._mmap.close() AttributeError: 'MMapIndexedDataset' object has no attribute '_bin_buffer_mmap'

StillKeepTry commented 5 years ago

It seems like your data has an empty line, but I can not give a correct response since these details are not enough to judge.