huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.71k stars 26.22k forks source link

Add Vocos model #25123

Open ylacombe opened 1 year ago

ylacombe commented 1 year ago

Model description

Vocos is a Fourier-based neural vocoder for audio synthesis.

According to its paper, Vocos constantly outperforms HifiGan, has 13.5M params and is significantly faster than any competing vocoders!

Moreover, it is also compatible with Bark, and significantly improve audio quality as showed here.

Vocos is composed of a backbone (ConvNeXt) and an inverse fourier transform head (either STFT or MDCT).

Open source status

Provide useful links for the implementation

Vocos code is available here and was mainly contributed by @hubertsiuzdak.

Its weights are available on HF hub here and here.

adi-kmt commented 1 year ago

Hey @amyeroberts, could I implement this?

amyeroberts commented 1 year ago

@kamathis4 Sure :)

cc @sanchit-gandhi

sanchit-gandhi commented 1 year ago

Feel free to open a PR if you're interested @kamathis4 - I think @ylacombe is also interested in adding this quite quickly, so the two of you could work together if desired!

adi-kmt commented 1 year ago

Hey @sanchit-gandhi, will be a bit busy for 4-5 days now. I think you can assign it to @ylacombe.

adi-kmt commented 1 year ago

Is this being implemented currently @sanchit-gandhi, or I could take it up

sanchit-gandhi commented 1 year ago

I think we should wait for a stable release of Vocos - the publicly available version is a v0.0.3, so is subject to change as more performant checkpoints are released

Once we get a v1, we can commit to an integration! How does this sound @kamathis4?