Open vadimkantorov opened 2 years ago
Hi
I am not aware of a GPU codec library for audio. Do you know one?
I'm also not aware, but maybe Lyra / SoundStream / LPCnet could be implemented on GPU (except maybe entropy coding). Also, maybe just some recommended codecs / settings optimized for fast decoding with DataLoader in DDP regime would be beneficial for community (e.g. how to configure it so that there is no thread oversubscription etc)
We do not have a good recommendation for the fast loading at the moment. A part of the reason is that the primal focus of the library has been providing domain-specific features that PyTorch does not provide natively. So we were focusing on the components rather than the pipeline. However, this is changing as the library matures and user base grows, and we do acknowledge the demand complex yet efficient data loading, and we are aware of the lack of our integral view of these components.
Having said that, there are couple of things I am considering for efficient data loading. (some of them are very random thoughts) tl;dr, I am thinking that re-designing the whole data loading experience will give more choices of solutions. It seems to me that libffcv like you mentioned in #1994 takes a similar approach.
Regarding parallelized (intra-file, for large files) audio decoding:
It might be possible to have a parallelized GPU decoding of some lightly-compressed files (e.g. FLAC or some other relatively simple audio codec which is targeted for fast branchless parallelizable decoding)
It might also be good to integrate some Facebook's neural codec code into torchaudio to widen exposure and usage, as neural codecs are the most amenable to fast GPU-based decoding :)
Also, batched parallelized audio reading/decoding can be used for speeding-up simple high-level methods like whisper's model.transcribe(['audio1.opus', 'audio2.opus', ...])
: https://github.com/openai/whisper/discussions/662#discussioncomment-7524821. Probably the right way for this would be always returning a NestedTensor as output (and allowing finer controls if out=
is provided). Might also be interesting to support some sort of background processing mode which returns some LazyTensor output immediately.
Regarding the wav loading, I don't think it can go beyond simply reading from disk in a single large chunk as does python's wave
package or scipy's scipy.io.wavfile.read - including mmap btw which in some cases may allow ammortizing the mem-access or disk-reading cost. I think, PyTorch needs to have a similar builtin simple function for dealing with such simple file formats (like WAV/PPM/OBJ/CSV file formats etc).
I think one needs C/C++ thread pool library to implement true batch decoding. I have some idea, but I feel that it is not a good fit for torchaudio or other domain libraries.
Maybe some standard openmp threading would cut it?
Might be a better fit for this new i/o package :)
Maybe some standard openmp threading would cut it?
PyTorch uses OpenMP as well, so I think it is better to have a separate parallelisms so that they can be configured in independent manner. I feel that it will be also better to have different parallelism for I/O-bound part (file access and networking) and CPU-bound part (decoding).
Might be a better fit for this new i/o package :)
The new i/o package in discussion is upstream of the existing domain libraries. I have a feeling that such serious project could better get started outside of the existing context.
🚀 The feature
GPU audio decoding at least for some codecs is useful for wider usage of compressed audio for training ASR models.
Maybe some neural codecs (I think Google open-sourced some neural codecs) would be more amenable to batched GPU decoding and integration into models directly.
Motivation, pitch
N/A
Alternatives
No response
Additional context
No response