Pretraining SpeechT5, meet problems about batch_sampler in multitask_dataset. Should I get idx and bin files of data one by one (wav) or get all of them in only two file(idx and bin each have one)

Hi, I want to pretrain a model using SpeechT5 arch. I follow the scripts you given here https://github.com/microsoft/SpeechT5/tree/main/SpeechT5#data-preparation. But I wonder if there is a restrict in fairseq-preprocess when preparing data. Because I met this error.

I found it raised error in the process of batching samples of the .index and .bin data provided by fairseq-preprocess. And here is what my batch_sampler shape looks like. There are 455 items in batch_sampler and each item has 6 items in it except the last one :

So in order to run successfully, I tried to give up the last row:

batch_sampler = batch_sampler[:-2] But then I got this:

I think it is caused by the function np.random.choice(). And I infer from it that the batch_sampler should be a list, which only contains one array in it, right?
But I have no idea how it comes out, should the index and bin files containing all train_data or just one row of train data?
What's the sampled object of the batch_sampler?

Here is what my directory:

I would really appreciative to you if you can explain this. Thank you!!!!!!!

microsoft / SpeechT5

Pretraining SpeechT5, meet problems about batch_sampler in multitask_dataset. Should I get idx and bin files of data one by one (wav) or get all of them in only two file(idx and bin each have one) #53