Data set discussion - Githubissues

hagenw commented 5 years ago

Recently, we released audtorch, an audio for PyTorch package that we started some time ago. It contains a few audio data sets that might be worth integrating here: Mozilla Common Voice, AudioSet, VoxCeleb1, LibriSpeech.

But before doing a pull request, there are a few things that I would like to discuss as I'm not completely happy with our current implementation:

Inherit from an Audio Base class or not?

To not repeat lots of code we use an audio data set base class, from which others can inherit. It handles data loading, sampling rate handling, check for existence, and applies transforms.
This approach is different from the two data sets VCTK and YesNo that are currently part of torchaudio
torchvision has a vision base class, but this is handling more or less only transforms.

What is the best way to handle the sampling rate?

Functions/classes using the data set have to be aware of the underlying sampling rate. At the moment we solved that by using a sampling_rate() property
Transforms can change the sampling rate of a data set, we added a semi-optimal check for that
Data sets can contain different sampling rates. At the moment we force the sampling rate to be the same
The two data sets currently part of torchaudio just return data and the user has to know the sampling rate.

Should we handle failures during data loading?

In order to not crash the training process we directly implemented error handling in our load function.

Note, our data sets currently return the data as numpy arrays as we use a lot of numpy transforms. But this can easily be changed.

cpuhrsch commented 5 years ago

Hello @hagenw,

Thank you for opening this issue. I can't quite give you a good reply to this just yet, because dataset abstractions are a topic we might need to revisit at a grand scale. I'll read through this in detail soon, but I want to signal you're heard via this reply.

Thanks, Christian

hagenw commented 5 years ago

Thanks for replying. There is no need to rush. The topic is indeed not trivial and it might be a good idea to make the right decisions at the beginning.

vincentqb commented 5 years ago

For reference: pytorch/pytorch#24915

pytorch / audio

Data set discussion #116