This PR removes any mentions of arrow/pyarrow and uses lazy JSONL loading for the manifests. It also adjusts the arguments to K2SpeechRecognitionDataset. As a bonus, there's a torchscript conversion in GigaSpeech decoding which I find handy. I verified that I'm able to start and run the GigaSpeech L training with both Lhotse and Snowfall PRs. I also adjusted the dataloading stuff for LibriSpeech/Aishell in the same way, but haven't run them.
Sorry for the reformatting of asr_datamodule.py, hopefully the diff is not too messed up.
The corresponding Lhotse PR is https://github.com/lhotse-speech/lhotse/pull/345. These two PRs should be merged together; I'll wait for either @danpovey or @csukuangfj to ack before merging both.
This PR removes any mentions of arrow/pyarrow and uses lazy JSONL loading for the manifests. It also adjusts the arguments to K2SpeechRecognitionDataset. As a bonus, there's a torchscript conversion in GigaSpeech decoding which I find handy. I verified that I'm able to start and run the GigaSpeech L training with both Lhotse and Snowfall PRs. I also adjusted the dataloading stuff for LibriSpeech/Aishell in the same way, but haven't run them.
Sorry for the reformatting of asr_datamodule.py, hopefully the diff is not too messed up.
The corresponding Lhotse PR is https://github.com/lhotse-speech/lhotse/pull/345. These two PRs should be merged together; I'll wait for either @danpovey or @csukuangfj to ack before merging both.