Closed ronnyskog closed 3 weeks ago
Hi, thanks for sharing a reproducible example. In my experience, Microsoft Edge (is free but) has the lowest quality among TTS models available. The best is elevenlabs, which was the TTS model used in the French and Brazilian Portuguese examples in the repo. You can also try OpenAI's but 11labs is the recommended option for multilinguality. http://linkedin.com/in/tharsissouza
On Thu, Oct 31, 2024 at 8:39 AM ronnyskog @.***> wrote:
My settings for Edge voices do not seem to take effect. I specify Norwegian voices, but they clearly speak with an American accent, apparently using the default US voices despite my settings. Am I doing this wrong? Or is it a bug?
podcast_config = { "word_count": 100, "conversation_style": ["Engaging", "Fast-paced", "Enthusiastic", "Educational"], "roles_person1": "Interviewer", "roles_person2": "Subject matter expert", "dialogue_structure": [ "Topic Introduction", "Summary of Key Points", "Discussions", "Q&A Session", "Farewell Messages", ], "podcast_name": "Printaga Podcast", "podcast_tagline": "Hvor kommer Halloween fra?", "output_language": "Norwegian", "user_instructions": "The podcast is in Norwegian. Use the Norwegian language. Make if fun and engaging", "engagement_techniques": [ "Rhetorical Questions", "Personal Testimonials", "Quotes", "Anecdotes", "Analogies", "Humor", ], "creativity": 0.7, "text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til min Podcast!", "default_tts_model": "edge",
"edge_tts": { "default_voices": { "question": "nb-NO-FinnNeural", "answer": "nb-NO-PernilleNeural", }, }, "audio_format": "mp3", }, }
— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/132, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3IB5NVS26NIEQX57PLZ6IJILAVCNFSM6AAAAABQ6DHJHOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZDMNRUGA2TSOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi, there is a mistake in your config dictionary. You should add the voice name or id to default_voices, question and answer. You are sending empty strings, hence podcastfy is taking the default values which are US-based.
"default_voices": { "question": "", # add voice name or voice id here "answer": "", # add voice name or voice id here },
http://linkedin.com/in/tharsissouza
On Thu, Oct 31, 2024 at 9:27 AM ronnyskog @.***> wrote:
I dont get good results with Eleven labs or OpenAI either. Everyone talks Norwegian with a heavy American accent. They sound totally different from the Norwegian sample voices. I suspect that the voices I specify are not the voices that I specify in the Python Script.
It this, correct? It does not seem to work at all:
"text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til Printaga Podcast!", "default_tts_model": "ElevenLabs", "ElevenLabs_tts": { "default_voices": { "question": "", "answer": "", }, "model": "eleven_multilingual_v2", },
— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/132#issuecomment-2449727287, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3MQ5FHBNXD5OZX5WH3Z6IO3PAVCNFSM6AAAAABQ6DHJHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBZG4ZDOMRYG4 . You are receiving this because you commented.Message ID: @.***>
I attempted to use ElevenLabs, specifying the female Norwegian voice for both roles. However, the output was still one male American voice and one female American voice. Therefore, it seems that specifying the voices is not functioning at all.
"text_to_speech": {
"temp_audio_dir": "./data/audio/tmp/",
"ending_message": "Takk for at du lytter til Printaga Podcast!",
"default_tts_model": "ElevenLabs",
"ElevenLabs_tts": {
"default_voices": {
"question": "Mia Starset",
"answer": "Mia Starset",
},
"model": "eleven_multilingual_v2",
},
Could you please share a fully reproducible example, including the code (CLI or Python package) and conversation_config (either yaml or dictionary)? I'd like to understand how you are passing the config to the API. http://linkedin.com/in/tharsissouza
On Thu, Oct 31, 2024 at 9:38 AM ronnyskog @.***> wrote:
I attempted to use ElevenLabs, specifying the female Norwegian voice for both roles. However, the output was still one male American voice and one female American voice. Therefore, it seems that specifying the voices is not functioning at all.
"text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til Printaga Podcast!", "default_tts_model": "ElevenLabs", "ElevenLabs_tts": { "default_voices": { "question": "Mia Starset", "answer": "Mia Starset", }, "model": "eleven_multilingual_v2", },
— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/132#issuecomment-2449747410, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3K7Y47J6X5RRIS54CDZ6IQERAVCNFSM6AAAAABQ6DHJHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBZG42DONBRGA . You are receiving this because you commented.Message ID: @.***>
Here is the full code. Python 3.12.5
from podcastfy.client import generate_podcast
urls = [ "https://www.jw.org/no/hva-bibelen-laerer/sporsmal/hva-er-opprinnelsen-til-halloween/", "https://www.jw.org/no/bibliotek/blad/g201309/sannheten-om-halloween/", ]
podcast_config = { "word_count": 100, "conversation_style": ["Engaging", "Fast-paced", "Enthusiastic", "Educational"], "roles_person1": "Interviewer", "roles_person2": "Subject matter expert", "dialogue_structure": [ "Topic Introduction", "Summary of Key Points", "Discussions", "Q&A Session", "Farewell Messages", ], "podcast_name": "Print Podcast", "podcast_tagline": "Hvor kommer Halloween fra?", "output_language": "Norwegian", "user_instructions": "The podcast is in Norwegian. Use the Norwegian language. Make if fun and engaging", "engagement_techniques": [ "Rhetorical Questions", "Personal Testimonials", "Quotes", "Anecdotes", "Analogies", "Humor", ], "creativity": 0.7, "text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til Print Podcast!", "default_tts_model": "ElevenLabs", "ElevenLabs_tts": { "default_voices": { "question": "Mia Starset", "answer": "Mia Starset", }, "model": "eleven_multilingual_v2", }, "audio_format": "mp3", }, }
my_podcast = generate_podcast(urls=urls, conversation_config=podcast_config)
print("Podcast generated:", my_podcast)
Thanks for sharing a reproducible example.
Try:
my_podcast = generate_podcast(urls=urls, conversation_config=podcast_config, tts_model='elevenlabs')
There is a bug where the "default_tts_model" from config is not being considered but if you pass the TTS model using the tts_model param it should work.
Please let me know! http://linkedin.com/in/tharsissouza
On Thu, Oct 31, 2024 at 9:52 AM ronnyskog @.***> wrote:
Here is the full code. Python 3.12.5
from podcastfy.client import generate_podcast List of URLs to convert into podcast audio
urls = [ " https://www.jw.org/no/hva-bibelen-laerer/sporsmal/hva-er-opprinnelsen-til-halloween/ ", "https://www.jw.org/no/bibliotek/blad/g201309/sannheten-om-halloween/", ] Define a custom conversation config for the podcast
podcast_config = { "word_count": 100, "conversation_style": ["Engaging", "Fast-paced", "Enthusiastic", "Educational"], "roles_person1": "Interviewer", "roles_person2": "Subject matter expert", "dialogue_structure": [ "Topic Introduction", "Summary of Key Points", "Discussions", "Q&A Session", "Farewell Messages", ], "podcast_name": "Point Podcast", "podcast_tagline": "Hvor kommer Halloween fra?", "output_language": "Norwegian", "user_instructions": "The podcast is in Norwegian. Use the Norwegian language. Make if fun and engaging", "engagement_techniques": [ "Rhetorical Questions", "Personal Testimonials", "Quotes", "Anecdotes", "Analogies", "Humor", ], "creativity": 0.7, "text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til Printaga Podcast!", "default_tts_model": "ElevenLabs", "ElevenLabs_tts": { "default_voices": { "question": "Mia Starset", "answer": "Mia Starset", }, "model": "eleven_multilingual_v2", }, "audio_format": "mp3", }, } Generate a podcast audio file using the specified conversation config
my_podcast = generate_podcast(urls=urls, conversation_config=podcast_config) Output the path of the generated audio file
print("Podcast generated:", my_podcast)
— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/132#issuecomment-2449772922, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3JLR4MTE3Q7LB27H6TZ6IRZBAVCNFSM6AAAAABQ6DHJHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBZG43TEOJSGI . You are receiving this because you commented.Message ID: @.***>
Bug solved in latest release
And the bug is back in v0.2.17.
The "default_tts_model" from config is not being considered, however passing the TTS model using the tts_model as per your suggestion does indeed work.
However, setting custom voices still dont work. Test this by specifying two male or two female voices in the config. The audio will still use the default voices with one man and one female. To actually get the voices i need I have to go into .venv\Lib\site-packages\podcastfy\conversation_config.yaml and change the voices manually which is not a good solution.
thanks for the follow up
should be fixed in latest release now
On Thu, Oct 31, 2024 at 4:33 PM ronnyskog @.***> wrote:
And the bug is back in v0.2.17.
The "default_tts_model" from config is not being considered, however passing the TTS model using the tts_model as per your suggestion does indeed work.
However, setting custom voices still dont work. Test this by specifying two male or two female voices in the config. The audio will still use the default voices with one man and one female. To actually get the voices i need I have to go into .venv\Lib\site-packages\podcastfy\conversation_config.yaml and change the voices manually which is not a good solution.
— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/132#issuecomment-2450670848, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3LWE7MF7CB6OVQNF4TZ6KAYPAVCNFSM6AAAAABQ6DHJHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJQGY3TAOBUHA . You are receiving this because you modified the open/close state.Message ID: @.***>
"openai": {
"default_voices": {
"question": "alloy",
"answer": "echo",
},
},
Console error:
Traceback (most recent call last):
File "d:\Data\Software\Python\Podcastify\app.py", line 55, in
My settings for Edge voices do not seem to take effect. I specify Norwegian voices, but they clearly speak with an American accent, apparently using the default US voices despite my settings. Am I doing this wrong? Or is it a bug?
podcast_config = { "word_count": 100, "conversation_style": ["Engaging", "Fast-paced", "Enthusiastic", "Educational"], "roles_person1": "Interviewer", "roles_person2": "Subject matter expert", "dialogue_structure": [ "Topic Introduction", "Summary of Key Points", "Discussions", "Q&A Session", "Farewell Messages", ], "podcast_name": "Printaga Podcast", "podcast_tagline": "Hvor kommer Halloween fra?", "output_language": "Norwegian", "user_instructions": "The podcast is in Norwegian. Use the Norwegian language. Make if fun and engaging", "engagement_techniques": [ "Rhetorical Questions", "Personal Testimonials", "Quotes", "Anecdotes", "Analogies", "Humor", ], "creativity": 0.7, "text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til min Podcast!", "default_tts_model": "edge", "edge_tts": { "default_voices": { "question": "nb-NO-FinnNeural", "answer": "nb-NO-PernilleNeural", }, }, "audio_format": "mp3", }, }