Closed skeydan closed 1 year ago
av
/tuneR
) - continually updated(1) Interface (torchaudio_load
):
In contrast to PT (https://pytorch.org/audio/stable/backend.html#load), we offer an alternative unit of seconds (in addition to samples) when specifying offsets/lengths. Providing both options is nice and should be kept, and in general means that any functionality not implemented "natively" in a backend has to be coded up in the backend-specific wrapper.
With tuneR
, wav files are handled automatically in this respect, while readMP3()
needs a follow-up call to extractWave()
. With av
, independent of file type, the time unit is automatic, while handling of samples requires calculation.
(2) Validation
torchaudio_load
(3) Defaults
torchaudio_load()
, as done in PT. (Right now both 0 and -1 effectively work as such.)The NEWS.md summarizes the main outcome of this PR. Copying here:
Thanks to superior performance as well as versatility, the default backend for loading audio files is now av
. av
is an efficient wrapper for Ffmpeg.
The refactorings involved in this update contain breaking changes as to naming and scope. Most mportantly:
av
is now the default backend, and it is a mandatory dependency. Linux users please consider the av installation instructions.
The only user-visible function to load audio is torchaudio_load()
. It will delegate to the default backend, or one you set with set_audio_backend()
.
As of this time, a supported alternative backend is tuneR
.
The only user-visible function to obtain audio file information is now torchaudio_info()
.
Notes on choices re renamings / naming convention / objects / fields etc. (continually updated)
torchaudio_load
gone, renamingtorchaudio_loader
totorchaudio_load
(PT:torchaudio.load
)info
totorchaudio_info
(PT:torchaudio.info
)av::av_media_info
delivering this kind of information, could think about enlargingAudioMetaData
by codec and bitrate (PT has sample_rate, num_frames, num_channels, bits_per_sample, encoding; see https://pytorch.org/audio/stable/backend.html#common-data-structure). On the other hand, need to callav_read_bin
anyway, sinceav_media_info
does not report number of samples. Andav_read_bin
itself is sufficient to provide the information offered bytorchaudio
currently. So should probably leave all as-is.wav
&mp3
wanted anymore. (PT/sox reports as tested WAV, AMB, MP3, FLAC, OGG/VORBIS, OPUS, SPHERE, AMR-NB. PT/Soundfile supports WAV, FLAC, OGG/VORBIS, SPHERE.)tuneR
completely this time, but leaving option and functionality. Also for ease of quick comparisons, if needed.av
is the default, and preferred as of this time, backend.av
andtuneR
are suggested, but not imported. Still, the package (in current state) needstuneR
, and should probably import it. To avoid such a situation, will makeav
mandatory now.