Open hagenw opened 5 years ago
Hello @hagenw,
Thank you for opening this issue. I can't quite give you a good reply to this just yet, because dataset abstractions are a topic we might need to revisit at a grand scale. I'll read through this in detail soon, but I want to signal you're heard via this reply.
Thanks, Christian
Thanks for replying. There is no need to rush. The topic is indeed not trivial and it might be a good idea to make the right decisions at the beginning.
For reference: pytorch/pytorch#24915
Recently, we released audtorch, an audio for PyTorch package that we started some time ago. It contains a few audio data sets that might be worth integrating here: Mozilla Common Voice, AudioSet, VoxCeleb1, LibriSpeech.
But before doing a pull request, there are a few things that I would like to discuss as I'm not completely happy with our current implementation:
Inherit from an Audio Base class or not?
torchaudio
torchvision
has a vision base class, but this is handling more or less only transforms.What is the best way to handle the sampling rate?
sampling_rate()
propertytorchaudio
just return data and the user has to know the sampling rate.Should we handle failures during data loading?
Note, our data sets currently return the data as numpy arrays as we use a lot of numpy transforms. But this can easily be changed.