refactor backend - Githubissues

skeydan commented 1 year ago

Notes on choices re renamings / naming convention / objects / fields etc. (continually updated)

With torchaudio_load gone, renaming torchaudio_loader to torchaudio_load (PT:torchaudio.load)
Also renaming info to torchaudio_info (PT: torchaudio.info)
With av::av_media_infodelivering this kind of information, could think about enlarging AudioMetaData by codec and bitrate (PT has sample_rate, num_frames, num_channels, bits_per_sample, encoding; see https://pytorch.org/audio/stable/backend.html#common-data-structure). On the other hand, need to call av_read_bin anyway, since av_media_info does not report number of samples. And av_read_bin itself is sufficient to provide the information offered by torchaudio currently. So should probably leave all as-is.
Removing specializations of info to file types, since unneeded.
Think about file type restrictions. Probably no strict confinement to wav & mp3 wanted anymore. (PT/sox reports as tested WAV, AMB, MP3, FLAC, OGG/VORBIS, OPUS, SPHERE, AMR-NB. PT/Soundfile supports WAV, FLAC, OGG/VORBIS, SPHERE.)
Thinking to not throw out tuneR completely this time, but leaving option and functionality. Also for ease of quick comparisons, if needed.
But will note that av is the default, and preferred as of this time, backend.
Right now, both av and tuneR are suggested, but not imported. Still, the package (in current state) needs tuneR, and should probably import it. To avoid such a situation, will make av mandatory now.

skeydan commented 1 year ago

Notes on specifics of loading (common/`av`/`tuneR`) - continually updated

(1) Interface (torchaudio_load):

In contrast to PT (https://pytorch.org/audio/stable/backend.html#load), we offer an alternative unit of seconds (in addition to samples) when specifying offsets/lengths. Providing both options is nice and should be kept, and in general means that any functionality not implemented "natively" in a backend has to be coded up in the backend-specific wrapper.

With tuneR, wav files are handled automatically in this respect, while readMP3() needs a follow-up call to extractWave(). With av, independent of file type, the time unit is automatic, while handling of samples requires calculation.

(2) Validation

move all generic input validation to torchaudio_load
put all loader-specific validation at the beginning of their entry-level functions (e.g., re file types)

(3) Defaults

Make -1 the default for duration in torchaudio_load(), as done in PT. (Right now both 0 and -1 effectively work as such.)

skeydan commented 1 year ago

The NEWS.md summarizes the main outcome of this PR. Copying here:

Thanks to superior performance as well as versatility, the default backend for loading audio files is now av. av is an efficient wrapper for Ffmpeg.

The refactorings involved in this update contain breaking changes as to naming and scope. Most mportantly:

av is now the default backend, and it is a mandatory dependency. Linux users please consider the av installation instructions.
The only user-visible function to load audio is torchaudio_load(). It will delegate to the default backend, or one you set with set_audio_backend().
As of this time, a supported alternative backend is tuneR.
The only user-visible function to obtain audio file information is now torchaudio_info().

mlverse / torchaudio

refactor backend #62

Notes on choices re renamings / naming convention / objects / fields etc. (continually updated)

Notes on specifics of loading (common/`av`/`tuneR`) - continually updated

mlverse / torchaudio

refactor backend #62

Notes on choices re renamings / naming convention / objects / fields etc. (continually updated)

Notes on specifics of loading (common/av/tuneR) - continually updated

Notes on specifics of loading (common/`av`/`tuneR`) - continually updated