mlverse / torchaudio

R interface to torchaudio
https://mlverse.github.io/torchaudio/
Other
27 stars 6 forks source link

refactor backend #62

Closed skeydan closed 1 year ago

skeydan commented 1 year ago

Notes on choices re renamings / naming convention / objects / fields etc. (continually updated)

  1. With torchaudio_load gone, renaming torchaudio_loader to torchaudio_load (PT:torchaudio.load)
  2. Also renaming info to torchaudio_info (PT: torchaudio.info)
  3. With av::av_media_infodelivering this kind of information, could think about enlarging AudioMetaData by codec and bitrate (PT has sample_rate, num_frames, num_channels, bits_per_sample, encoding; see https://pytorch.org/audio/stable/backend.html#common-data-structure). On the other hand, need to call av_read_bin anyway, since av_media_info does not report number of samples. And av_read_bin itself is sufficient to provide the information offered by torchaudio currently. So should probably leave all as-is.
  4. Removing specializations of info to file types, since unneeded.
  5. Think about file type restrictions. Probably no strict confinement to wav & mp3 wanted anymore. (PT/sox reports as tested WAV, AMB, MP3, FLAC, OGG/VORBIS, OPUS, SPHERE, AMR-NB. PT/Soundfile supports WAV, FLAC, OGG/VORBIS, SPHERE.)
  6. Thinking to not throw out tuneR completely this time, but leaving option and functionality. Also for ease of quick comparisons, if needed.
  7. But will note that av is the default, and preferred as of this time, backend.
  8. Right now, both av and tuneR are suggested, but not imported. Still, the package (in current state) needs tuneR, and should probably import it. To avoid such a situation, will make av mandatory now.
skeydan commented 1 year ago

Notes on specifics of loading (common/av/tuneR) - continually updated

(1) Interface (torchaudio_load):

In contrast to PT (https://pytorch.org/audio/stable/backend.html#load), we offer an alternative unit of seconds (in addition to samples) when specifying offsets/lengths. Providing both options is nice and should be kept, and in general means that any functionality not implemented "natively" in a backend has to be coded up in the backend-specific wrapper.

With tuneR, wav files are handled automatically in this respect, while readMP3() needs a follow-up call to extractWave(). With av, independent of file type, the time unit is automatic, while handling of samples requires calculation.

(2) Validation

(3) Defaults

skeydan commented 1 year ago

The NEWS.md summarizes the main outcome of this PR. Copying here:

Thanks to superior performance as well as versatility, the default backend for loading audio files is now av. av is an efficient wrapper for Ffmpeg.

The refactorings involved in this update contain breaking changes as to naming and scope. Most mportantly: