Closed tianchaolangzi closed 3 months ago
Lhotse Shar CutSet is supposed to be iterated over and not indexed. You'll need to use iterable datasets. See this tutorial for an end to end example of usage: https://colab.research.google.com/github/lhotse-speech/lhotse/blob/master/examples/04-lhotse-shar.ipynb
Lhotse Shar CutSet is supposed to be iterated over and not indexed. You'll need to use iterable datasets. See this tutorial for an end to end example of usage: https://colab.research.google.com/github/lhotse-speech/lhotse/blob/master/examples/04-lhotse-shar.ipynb
Thank you so so much. The problem is solved. And I think during the training process, the 1000 samples in a shar cannot be shuffled. Shuffle can only be used at the shar level.
You can shuffle, just call cuts.shuffle(buffer_size=10000)
(unless you're using a lhotse Sampler which will do it for you). It performs approximate streaming shuffling.
Here I implemented a dataset `
class XXDataset(torch.utils.data.Dataset): def init(self, shar_dir, voice_types, sample_rate=16000, ir_file=None, ir_portion=0.5): self.sample_rate = sample_rate self.cuts = self.get_cuts(shar_dir) self.voices = voice_types self.voice2index = {} for idx, voice in enumerate(voice_types): self.voice2index[voice] = idx self.ir_file = ir_file self.ir_portion = ir_portion assert 0 < self.ir_portion < 1, "ir_portion set wrong"
` But it is very slow, and the time to load data will increase batch by batch.
` Total time: 48.4381 s File:xx Function: getitem at line 35
Line # Hits Time Per Hit % Time Line Contents