Closed balag59 closed 4 years ago
Update: I've looked into the .it files and these are empty so that would explain the error above. I'm sot sure why they are empty but I'll try the tokenization again
It was a mistake in the tokenization so everything is fine now.
Hi, I'm trying to recreate the EN-IT experiment on the MustC corpus and ran into this issue while training: Traceback (most recent call last): File "train.py", line 367, in
main(args)
File "train.py", line 73, in main
shard_id=args.distributed_rank,
File "FBK-Fairseq-ST/fairseq/tasks/fairseq_task.py", line 96, in get_batch_iterator
indices = dataset.ordered_indices()
File "FBK-Fairseq-ST/fairseq/data/language_pair_dataset.py", line 250, in ordered_indices
indices = indices[np.argsort(self.tgt_sizes[indices], kind='mergesort')]
IndexError: index 216490 is out of bounds for axis 0 with size 0
It seems that the tgt_sizes np array has a shape (0,) so this is causing the issue. Could you please guide me on resolving this issue?Thanks!