microsoft / SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
MIT License
1.09k stars 113 forks source link

Pretraining SpeechT5, meet problems about batch_sampler in multitask_dataset. Should I get idx and bin files of data one by one (wav) or get all of them in only two file(idx and bin each have one) #53

Open Lemonaddeee opened 1 year ago

Lemonaddeee commented 1 year ago

Hi, I want to pretrain a model using SpeechT5 arch. I follow the scripts you given here https://github.com/microsoft/SpeechT5/tree/main/SpeechT5#data-preparation. But I wonder if there is a restrict in fairseq-preprocess when preparing data. Because I met this error.

image

I found it raised error in the process of batching samples of the .index and .bin data provided by fairseq-preprocess. And here is what my batch_sampler shape looks like. There are 455 items in batch_sampler and each item has 6 items in it except the last one :

image image

So in order to run successfully, I tried to give up the last row:

batch_sampler = batch_sampler[:-2] But then I got this:

image
  1. I think it is caused by the function np.random.choice(). And I infer from it that the batch_sampler should be a list, which only contains one array in it, right?
  2. But I have no idea how it comes out, should the index and bin files containing all train_data or just one row of train data?
  3. What's the sampled object of the batch_sampler?

Here is what my directory:

image

I would really appreciative to you if you can explain this. Thank you!!!!!!!