microsoft / SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
MIT License
1.16k stars 113 forks source link

SpeechT5 Pretrain ERROR #28

Closed benyang0506 closed 1 year ago

benyang0506 commented 1 year ago

when pretrained 95400 num_updates,

File "/SpeechT5/SpeechT5/SpeechT5/speecht5/data/multitask_dataset.py", line 58, in getitem sample = self.datasets[dataset_idx][sample_idx] File "/SpeechT5/SpeechT5/SpeechT5/speecht5/data/text_dataset.py", line 218, in getitem assert (source[1:-1] >= 1).all() IndexError: slice() cannot be applied to a 0-dim tensor

the reason comes from text data preparation?

Ajyy commented 1 year ago

Hi,

This error may be related to text data preparation. It means the length of the text sentence is 0. I suggest checking the text data. If there are some empty lines, you should delete them.