pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.49k stars 643 forks source link

Windows Support #425

Closed vincentqb closed 1 year ago

vincentqb commented 4 years ago

To bring Windows support with mp3 support, we need

If and only if no backend support mp3 on Windows after the above:

Closes #50, closes #219, closes #258.

cc @peterjc123, @chauhang, pytorch/pytorch#24344

peterjc123 commented 4 years ago

The kaldi_io test is passing on Windows now. BTW, I think it's hard to compile Sox on Windows. Other things sound reasonable to me.

vincentqb commented 4 years ago

Thanks for the input. Can you share the output of CircleCI where the kaldi_io tests are passing?

If SoX is not possible to compile on Windows, we'll need to identify an alternative backend that offers similar file support on Windows: mp3, flac, wav, at least. soundfile unfortunately doesn't support mp3. See e.g. comparison.

peterjc123 commented 4 years ago

Thanks for the input. Can you share the output of CircleCI where the kaldi_io tests are passing?

Sure. It was posted here: https://github.com/pytorch/audio/pull/419#issuecomment-582231987.

If SoX is not possible to compile on Windows, we'll need to identify an alternative backend that offers similar file support on Windows: mp3, flac, wav, at least.

What about this one? https://github.com/beetbox/audioread or https://github.com/librosa/librosa?

vincentqb commented 4 years ago

What about this one? https://github.com/beetbox/audioread or https://github.com/librosa/librosa?

aubio seems to perform better than librosa, according to this, and supports more format than audioread. Thoughts?

peterjc123 commented 4 years ago

Well, it looks good to me except its package on pypi is a source package. However, if we use the C/C++ part then we should be okay.

vincentqb commented 4 years ago

Well, it looks good to me except its package on pypi is a source package. However, if we use the C/C++ part then we should be okay.

What is the implication of a source package?

peterjc123 commented 4 years ago

As you can see from https://pypi.org/project/aubio/#files, only the file ends with .tar.gz is available.

vincentqb commented 4 years ago

What about this one? https://github.com/beetbox/audioread or https://github.com/librosa/librosa?

Both seems good then. Let's go for audioread then, since it appears to be faster than librosa. I've updated the description above to reflect the choice of audioread over sox for windows.

dachosen1 commented 4 years ago

Have you looked into pydub? https://github.com/jiaaro/pydub

I've been using it on windows, and it works great for mp3 and wav files. The installation is a bit involved since it requires the user to add ffmpeg to the environment path?

faroit commented 4 years ago

@vincentqb Just some small remarks...

For the various use cases of audio i/o there are two scenarios where loading is used within torchaudio:

  1. Training

Here, loading and decoding performance is crucial and easily becomes the bottleneck of dataloaders that deal with raw audio. Typically expensive compression formats should be avoided and simple formats such as wav, flac and mp3 should be used instead. Furthermore seeking support is crucial to load chunked audio from original (larger tracks) In this use-case we already have libsndfile, interfaced with pysoundfile that cover wav and flac (at one point it would make sense to directly interface libsndfile to avoid numpy). Regarding MP3 support (+windows) I just discovered minimp3 that ticks all boxes. Also it is ridiculously fast and therefore could easily be the best tradeoff between loading and decoding speed.

  1. Inference

Here, performance is not that crucial but support for various formats such as m4a/mp4/aac would be beneficial. As we often discussed in torchaudio-contrib, I still don't see any way around ffmpeg. ;-)

To sum up, I don't think it make sense to add another python package for audio i/o and instead focus on more low level and faster alternatives such as minimp3 that also come with less dependencies. What do you think?

vincentqb commented 4 years ago

In this use-case we already have libsndfile, interfaced with pysoundfile that cover wav and flac (at one point it would make sense to directly interface libsndfile to avoid numpy). Regarding MP3 support (+windows) I just discovered minimp3 that ticks all boxes. Also it is ridiculously fast and therefore could easily be the best tradeoff between loading and decoding speed.

@faroit -- Have you run your benchmark with minimp3? I'd love to see how it compares.

You are suggesting having a mix of backend for different format? That could be an option, yes. However, the context of this particular pull request is to make torchaudio available on Windows with the same features as the other OSs supported, and so this particular pull request doesn't push the boundaries of speed :)

  1. Inference

Here, performance is not that crucial but support for various formats such as m4a/mp4/aac would be beneficial. As we often discussed in torchaudio-contrib, I still don't see any way around ffmpeg. ;-)

To sum up, I don't think it make sense to add another python package for audio i/o and instead focus on more low level and faster alternatives such as minimp3 that also come with less dependencies. What do you think?

I agree that there are already many python libraries loading audio files. In particular, those that load into numpy can be then used to load into pytorch, since pytorch can convert tensors from/to numpy at no cost. This means most users that want some very specific audio file can already do so.

It is still convenient for the users to get support for some common audio file format directly in torchaudio. But we can focus on the most critical format (wav, flac, mp3), and support them well and fast.

In that context, since ffmpeg is a heavy dependency, I would avoid depending on it for as long as I can. :)

peterjc123 commented 4 years ago

@vincentqb Actually both audioread and aubio relies on ffmpeg.

vincentqb commented 4 years ago

Ah, good point. Has any of you faced any challenges such as this installing audioread? If not, I'd say we move forward anyway.

By the way, torchvision is also moving toward ffmpeg for video.

@cpuhrsch -- You voiced not being in favor of ffmpeg in the past. Any comments?

peterjc123 commented 4 years ago

@vincentqb It will be easy for conda users because they can simply do conda install -c conda-forge ffmpeg. To make it convenient for other users, we may just distribute the DLLs for them.

peterjc123 commented 4 years ago

@vincentqb BTW, users can only read a file using audioread, but not write. If we want to create a new backend like sndfile and sox, we'd better choose something else.

vincentqb commented 4 years ago

Let's list the requirements for a backend:

@peterjc123 -- Please do let me know if I forget anything in this list. Do you know any other backend that would work well with those criteria?

faroit commented 4 years ago

@vincentqb

Have you run your benchmark with minimp3? I'd love to see how it compares.

There is no functional python/numpy interface yet – see status of pyminimp3, so I used the implementation recently added to tf.io. The performance looks incredible:

benchmark_tf

(ar_ffmpeg is audioreads ffmpeg interface)

faroit commented 4 years ago

@vincentqb @peterjc123

Sorry for hijacking this thread.

In that context, since ffmpeg is a heavy dependency, I would avoid depending on it for as long as I can. :)

I totally agree with you. FFMPEG is going to painful. But I don't think there is any other alternative to support a large number of formats.

That's why I think we should have some fast decoder-only alternatives for a limited number of formats (useful for training). I am still in favor of removing sox and just go sndfile/minimp3 for this scenario. Then ffmpeg for writing and everything else where loading speed in not an issue.

cpuhrsch commented 4 years ago

On ffmpeg, I'd like to add the idea that, in general, we want backends to be opt-in.

By default we should pick a light library that works for most common formats and then allow the user to switch to different backends (such as ffmpeg) for either performance or features.

Figuring out how to setup this backend dispatch mechanism could probably resolve many of the discussions here. Essentially we want to have load and save dispatch to a different backend depending on file-format and the user's settings.

The simplest approach is to make a choice at compile-time. We're already beyond that with our global run-time backend mechanism.

A more granular approach is to then allow users setting different backends for each file format.

Then beyond that we can even introduce preferred orders per fileformat based on available formats (e.g. use specialized library X over Y when available, but transparently default to Y otherwise).

vincentqb commented 4 years ago

Right, although the current choice for globack runtime backend dispatch, we do not support mp3 for windows. One option is to switch default global backend to something that also supports mp3 for windows. Another is to add a file-format-dependent dispatch.

The former would favor going all-in with ffmpeg. The latter favors minimp3.

Based on feedback above from @faroit and @cpuhrsch, the latter is preferred as the next step. I'm good with that conclusion, so I'll update the todo/description above to reflect that.

peterjc123 commented 4 years ago

@vincentqb I saw a post that describes how to compile torchaudio with Sox. Will try that later.

peterjc123 commented 4 years ago

Torchaudio with Sox: https://github.com/pytorch/audio/pull/648

vincentqb commented 3 years ago

mp3 for windows without sox in #1000

adefossez commented 3 years ago

@vincentqb if you want also support writing MP3s on Windows, I would recommend https://github.com/chrisstaite/lameenc

I have been using it for a while inside demucs, and it is amazing (in the sense that it is small, no extra dependencies, and works perfectly with just a pip install on all OSes). At the moment though it seems their build for python3.9 is broken...

vincentqb commented 3 years ago

thanks for the input :)

zackees commented 1 year ago

Hi there, I see that ffmpeg and sox are issues for this library. I want to let you know that I've solved these exact problems for tools like this so that these binaries can be easily deployed for Mac/Win/Linux.

Please see:

https://github.com/zackees/static-ffmpeg https://github.com/zackees/static-sox

Using tools like ffmpeg will allow you to write mp3's with minimal code and have it work everywhere. I recommend using static_ffmpeg.add_paths(weak=True) and static_sox.add_paths(weak=True).

These python packages are available through pip as well so can be included in your dependency management. The binaries are only downloaded when they are first used. By specifying weak=True the libraries will only download ffmpeg/sox if the binaries don't already exist on the system.