Problem with loading rawwav as feature, it costs too much time

liuzz13 commented 4 years ago

Hi Recently, I did some experiments with pytorch-kaldi. I find that it cost too much time to load data when using raw wav as feature, even more than training time. Do we have a batter dataloader? Or how can I reduce the time to load data?

mravanelli commented 4 years ago

Yes, we are aware of it. That's because the toolkit is designed to read features in kaldi format rather than waveforms. For the speechbrain project we are developing a totally new dataloader that can read any kind of features (including waveforms) in a very very fast way.

Best,

Mirco

On Sun, 9 Feb 2020 at 08:37, liuzz13 notifications@github.com wrote:

Hi Recently, I did some experiments with pytorch-kaldi. I find that it cost too much time to load data when using raw wav as feature, even more than training time. Do we have a batter dataloader? Or how can I reduce the time to load data?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mravanelli/pytorch-kaldi/issues/212?email_source=notifications&email_token=AEA2ZVXQ237AU7OB67ECWRLRCABLNA5CNFSM4KSBWZT2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IMCFAYQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA2ZVSJDPVC5YH2GQ2SERTRCABLNANCNFSM4KSBWZTQ .

lezasantaizi commented 4 years ago

Hi Recently, I did some experiments with pytorch-kaldi. I find that it cost too much time to load data when using raw wav as feature, even more than training time. Do we have a batter dataloader? Or how can I reduce the time to load data?

transform wav to feature

mravanelli / pytorch-kaldi

Problem with loading rawwav as feature, it costs too much time #212