Open wilke0818 opened 3 months ago
@wilke0818 do you mind writing 2 lines here on why this was paused/we (at least temporarily) gave up with this? thanks!
cc:ing @900miles as he is playing with styletts2 currently.
Looks like we can remove the dependency on espeak
with 2 forked packages, but they report lower quality than the one using espeak
. This would also remove GPL-licensed code if that is relevant.
https://pypi.org/project/styletts2/ https://github.com/NeuralVox/StyleTTS2
Looks like we can remove the dependency on
espeak
with 2 forked packages, but they report lower quality than the one usingespeak
. This would also remove GPL-licensed code if that is relevant.https://pypi.org/project/styletts2/ https://github.com/NeuralVox/StyleTTS2
any chance we can listen to some audio clips and evaluate the quality ourselves?
Yeah working on that right now
I've also used their Colabs and uploaded my own audios to it and found similar issues to other TTS models with high pitch screeching and generally not matching the target audio: https://colab.research.google.com/github/yl4579/StyleTTS2/blob/main/Colab/StyleTTS2_Demo_LibriTTS.ipynb
Weird, I've had really good results with that demo. What target voices are you using?
Fabio's
With the pip package (the second voice is Bob Ross). audio_tests.zip
With the pip package (the second voice is Bob Ross). audio_tests.zip
these sound pretty good. are they generated with or without espeak
?
Without
On Tue, Nov 19, 2024, 4:25 PM Fabio Catania @.***> wrote:
With the pip package (the second voice is Bob Ross). audio_tests.zip https://github.com/user-attachments/files/17820270/audio_tests.zip
these sound pretty good. are they generated with or without espeak?
— Reply to this email directly, view it on GitHub https://github.com/sensein/senselab/issues/143#issuecomment-2486784392, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUHWUERX4A7JOYBICCKT7D2BOUFVAVCNFSM6AAAAABMQ7ICBGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBWG44DIMZZGI . You are receiving this because you were assigned.Message ID: @.***>
Without … On Tue, Nov 19, 2024, 4:25 PM Fabio Catania @.> wrote: With the pip package (the second voice is Bob Ross). audio_tests.zip https://github.com/user-attachments/files/17820270/audio_tests.zip these sound pretty good. are they generated with or without espeak? — Reply to this email directly, view it on GitHub <#143 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUHWUERX4A7JOYBICCKT7D2BOUFVAVCNFSM6AAAAABMQ7ICBGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBWG44DIMZZGI . You are receiving this because you were assigned.Message ID: @.>
wow. i would vote for @900miles 's idea to integrate styletts2
without espeak
. objections?
Without … On Tue, Nov 19, 2024, 4:25 PM Fabio Catania @.> wrote: With the pip package (the second voice is Bob Ross). audio_tests.zip https://github.com/user-attachments/files/17820270/audio_tests.zip these sound pretty good. are they generated with or without espeak? — Reply to this email directly, view it on GitHub <#143 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUHWUERX4A7JOYBICCKT7D2BOUFVAVCNFSM6AAAAABMQ7ICBGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBWG44DIMZZGI . You are receiving this because you were assigned.Message ID: @.>
Check your environment to make sure that espeak isn't there/being used. I thought I didn't have it when I first developed the StyleTTS API but it was there/I only found out through Colab I think.
From the pypi page:
Currently using MIT-licensed gruut as the IPA phoneme converter. Found it to be the best alternative to phoneme converters based on espeak
It sounds like one motivation for this fork of the original package is to not use espeak because it is GPL licensed. So that shouldn't be an issue.
Two things I will mention before committing with this package is that it has a lot of dependencies (although many are probably overlapping with other senselab dependencies), and there is an open pull request to "fix: high-severity vulnerability in nltk 3.8.1" that hasn't had activity on it since September. So I don't think it is being actively developed. And to clarify it is not by the same authors as Style-TTS2, which I if I remember correctly was a consideration when integrating the whisperx pypi package into senselab, as that had the same issue.
From the pypi page:
Currently using MIT-licensed gruut as the IPA phoneme converter. Found it to be the best alternative to phoneme converters based on espeak
It sounds like one motivation for this fork of the original package is to not use espeak because it is GPL licensed. So that shouldn't be an issue.
Two things I will mention before committing with this package is that it has a lot of dependencies (although many are probably overlapping with other senselab dependencies), and there is an open pull request to "fix: high-severity vulnerability in nltk 3.8.1" that hasn't had activity on it since September. So I don't think it is being actively developed. And to clarify it is not by the same authors as Style-TTS2, which I if I remember correctly was a consideration when integrating the whisperx pypi package into senselab, as that had the same issue.
Indeed, we should be careful with the packages we integrate. It might be best to get the best of both solutions. We can start by testing @wilke0818 's solution with the same audio clips you tried ( @900miles ) to check if it provides similar quality results. If it does, we can proceed with that implementation and avoid using espeak.
Description
Currently StyleTTS2 cannot be supported through TorchHub as originally thought because of its dependency on
espeak
which is not directly a Python dependency but rather a system dependency. Code currently exists trying to integrate StyleTTS with TorchHub but the tests fail if the local environment doesn't have this dependency (this wasn't originally noticed because @wilke0818's had espeak locally).This task/issue also raises the idea for creating a more generalizable approach to incorporating functionality that can't be directly integrated with Python, namely by using Pydra and Docker containers.
Tasks
Freeform Notes
No response