souzatharsis / podcastfy

An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI
https://www.podcastfy.ai
Apache License 2.0
1.13k stars 122 forks source link

Setting default Edge voices #132

Closed ronnyskog closed 3 weeks ago

ronnyskog commented 3 weeks ago

My settings for Edge voices do not seem to take effect. I specify Norwegian voices, but they clearly speak with an American accent, apparently using the default US voices despite my settings. Am I doing this wrong? Or is it a bug?

podcast_config = { "word_count": 100, "conversation_style": ["Engaging", "Fast-paced", "Enthusiastic", "Educational"], "roles_person1": "Interviewer", "roles_person2": "Subject matter expert", "dialogue_structure": [ "Topic Introduction", "Summary of Key Points", "Discussions", "Q&A Session", "Farewell Messages", ], "podcast_name": "Printaga Podcast", "podcast_tagline": "Hvor kommer Halloween fra?", "output_language": "Norwegian", "user_instructions": "The podcast is in Norwegian. Use the Norwegian language. Make if fun and engaging", "engagement_techniques": [ "Rhetorical Questions", "Personal Testimonials", "Quotes", "Anecdotes", "Analogies", "Humor", ], "creativity": 0.7, "text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til min Podcast!", "default_tts_model": "edge", "edge_tts": { "default_voices": { "question": "nb-NO-FinnNeural", "answer": "nb-NO-PernilleNeural", }, }, "audio_format": "mp3", }, }

souzatharsis commented 3 weeks ago

Hi, thanks for sharing a reproducible example. In my experience, Microsoft Edge (is free but) has the lowest quality among TTS models available. The best is elevenlabs, which was the TTS model used in the French and Brazilian Portuguese examples in the repo. You can also try OpenAI's but 11labs is the recommended option for multilinguality. http://linkedin.com/in/tharsissouza

On Thu, Oct 31, 2024 at 8:39 AM ronnyskog @.***> wrote:

My settings for Edge voices do not seem to take effect. I specify Norwegian voices, but they clearly speak with an American accent, apparently using the default US voices despite my settings. Am I doing this wrong? Or is it a bug?

podcast_config = { "word_count": 100, "conversation_style": ["Engaging", "Fast-paced", "Enthusiastic", "Educational"], "roles_person1": "Interviewer", "roles_person2": "Subject matter expert", "dialogue_structure": [ "Topic Introduction", "Summary of Key Points", "Discussions", "Q&A Session", "Farewell Messages", ], "podcast_name": "Printaga Podcast", "podcast_tagline": "Hvor kommer Halloween fra?", "output_language": "Norwegian", "user_instructions": "The podcast is in Norwegian. Use the Norwegian language. Make if fun and engaging", "engagement_techniques": [ "Rhetorical Questions", "Personal Testimonials", "Quotes", "Anecdotes", "Analogies", "Humor", ], "creativity": 0.7, "text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til min Podcast!", "default_tts_model": "edge",

"edge_tts": { "default_voices": { "question": "nb-NO-FinnNeural", "answer": "nb-NO-PernilleNeural", }, }, "audio_format": "mp3", }, }

— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/132, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3IB5NVS26NIEQX57PLZ6IJILAVCNFSM6AAAAABQ6DHJHOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZDMNRUGA2TSOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

souzatharsis commented 3 weeks ago

Hi, there is a mistake in your config dictionary. You should add the voice name or id to default_voices, question and answer. You are sending empty strings, hence podcastfy is taking the default values which are US-based.

"default_voices": { "question": "", # add voice name or voice id here "answer": "", # add voice name or voice id here },

http://linkedin.com/in/tharsissouza

On Thu, Oct 31, 2024 at 9:27 AM ronnyskog @.***> wrote:

I dont get good results with Eleven labs or OpenAI either. Everyone talks Norwegian with a heavy American accent. They sound totally different from the Norwegian sample voices. I suspect that the voices I specify are not the voices that I specify in the Python Script.

It this, correct? It does not seem to work at all:

"text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til Printaga Podcast!", "default_tts_model": "ElevenLabs", "ElevenLabs_tts": { "default_voices": { "question": "", "answer": "", }, "model": "eleven_multilingual_v2", },

— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/132#issuecomment-2449727287, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3MQ5FHBNXD5OZX5WH3Z6IO3PAVCNFSM6AAAAABQ6DHJHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBZG4ZDOMRYG4 . You are receiving this because you commented.Message ID: @.***>

ronnyskog commented 3 weeks ago

I attempted to use ElevenLabs, specifying the female Norwegian voice for both roles. However, the output was still one male American voice and one female American voice. Therefore, it seems that specifying the voices is not functioning at all.

"text_to_speech": {
    "temp_audio_dir": "./data/audio/tmp/",
    "ending_message": "Takk for at du lytter til Printaga Podcast!",
    "default_tts_model": "ElevenLabs",
    "ElevenLabs_tts": {
        "default_voices": {
            "question": "Mia Starset",
            "answer": "Mia Starset",
        },
        "model": "eleven_multilingual_v2",
    },
souzatharsis commented 3 weeks ago

Could you please share a fully reproducible example, including the code (CLI or Python package) and conversation_config (either yaml or dictionary)? I'd like to understand how you are passing the config to the API. http://linkedin.com/in/tharsissouza

On Thu, Oct 31, 2024 at 9:38 AM ronnyskog @.***> wrote:

I attempted to use ElevenLabs, specifying the female Norwegian voice for both roles. However, the output was still one male American voice and one female American voice. Therefore, it seems that specifying the voices is not functioning at all.

"text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til Printaga Podcast!", "default_tts_model": "ElevenLabs", "ElevenLabs_tts": { "default_voices": { "question": "Mia Starset", "answer": "Mia Starset", }, "model": "eleven_multilingual_v2", },

— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/132#issuecomment-2449747410, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3K7Y47J6X5RRIS54CDZ6IQERAVCNFSM6AAAAABQ6DHJHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBZG42DONBRGA . You are receiving this because you commented.Message ID: @.***>

ronnyskog commented 3 weeks ago

Here is the full code. Python 3.12.5

from podcastfy.client import generate_podcast

List of URLs to convert into podcast audio

urls = [ "https://www.jw.org/no/hva-bibelen-laerer/sporsmal/hva-er-opprinnelsen-til-halloween/", "https://www.jw.org/no/bibliotek/blad/g201309/sannheten-om-halloween/", ]

Define a custom conversation config for the podcast

podcast_config = { "word_count": 100, "conversation_style": ["Engaging", "Fast-paced", "Enthusiastic", "Educational"], "roles_person1": "Interviewer", "roles_person2": "Subject matter expert", "dialogue_structure": [ "Topic Introduction", "Summary of Key Points", "Discussions", "Q&A Session", "Farewell Messages", ], "podcast_name": "Print Podcast", "podcast_tagline": "Hvor kommer Halloween fra?", "output_language": "Norwegian", "user_instructions": "The podcast is in Norwegian. Use the Norwegian language. Make if fun and engaging", "engagement_techniques": [ "Rhetorical Questions", "Personal Testimonials", "Quotes", "Anecdotes", "Analogies", "Humor", ], "creativity": 0.7, "text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til Print Podcast!", "default_tts_model": "ElevenLabs", "ElevenLabs_tts": { "default_voices": { "question": "Mia Starset", "answer": "Mia Starset", }, "model": "eleven_multilingual_v2", }, "audio_format": "mp3", }, }

Generate a podcast audio file using the specified conversation config

my_podcast = generate_podcast(urls=urls, conversation_config=podcast_config)

Output the path of the generated audio file

print("Podcast generated:", my_podcast)

souzatharsis commented 3 weeks ago

Thanks for sharing a reproducible example.

Try:

my_podcast = generate_podcast(urls=urls, conversation_config=podcast_config, tts_model='elevenlabs')

There is a bug where the "default_tts_model" from config is not being considered but if you pass the TTS model using the tts_model param it should work.

Please let me know! http://linkedin.com/in/tharsissouza

On Thu, Oct 31, 2024 at 9:52 AM ronnyskog @.***> wrote:

Here is the full code. Python 3.12.5

from podcastfy.client import generate_podcast List of URLs to convert into podcast audio

urls = [ " https://www.jw.org/no/hva-bibelen-laerer/sporsmal/hva-er-opprinnelsen-til-halloween/ ", "https://www.jw.org/no/bibliotek/blad/g201309/sannheten-om-halloween/", ] Define a custom conversation config for the podcast

podcast_config = { "word_count": 100, "conversation_style": ["Engaging", "Fast-paced", "Enthusiastic", "Educational"], "roles_person1": "Interviewer", "roles_person2": "Subject matter expert", "dialogue_structure": [ "Topic Introduction", "Summary of Key Points", "Discussions", "Q&A Session", "Farewell Messages", ], "podcast_name": "Point Podcast", "podcast_tagline": "Hvor kommer Halloween fra?", "output_language": "Norwegian", "user_instructions": "The podcast is in Norwegian. Use the Norwegian language. Make if fun and engaging", "engagement_techniques": [ "Rhetorical Questions", "Personal Testimonials", "Quotes", "Anecdotes", "Analogies", "Humor", ], "creativity": 0.7, "text_to_speech": { "temp_audio_dir": "./data/audio/tmp/", "ending_message": "Takk for at du lytter til Printaga Podcast!", "default_tts_model": "ElevenLabs", "ElevenLabs_tts": { "default_voices": { "question": "Mia Starset", "answer": "Mia Starset", }, "model": "eleven_multilingual_v2", }, "audio_format": "mp3", }, } Generate a podcast audio file using the specified conversation config

my_podcast = generate_podcast(urls=urls, conversation_config=podcast_config) Output the path of the generated audio file

print("Podcast generated:", my_podcast)

— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/132#issuecomment-2449772922, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3JLR4MTE3Q7LB27H6TZ6IRZBAVCNFSM6AAAAABQ6DHJHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBZG43TEOJSGI . You are receiving this because you commented.Message ID: @.***>

souzatharsis commented 3 weeks ago

Bug solved in latest release

ronnyskog commented 3 weeks ago

And the bug is back in v0.2.17.

The "default_tts_model" from config is not being considered, however passing the TTS model using the tts_model as per your suggestion does indeed work.

However, setting custom voices still dont work. Test this by specifying two male or two female voices in the config. The audio will still use the default voices with one man and one female. To actually get the voices i need I have to go into .venv\Lib\site-packages\podcastfy\conversation_config.yaml and change the voices manually which is not a good solution.

souzatharsis commented 3 weeks ago

thanks for the follow up

should be fixed in latest release now

On Thu, Oct 31, 2024 at 4:33 PM ronnyskog @.***> wrote:

And the bug is back in v0.2.17.

The "default_tts_model" from config is not being considered, however passing the TTS model using the tts_model as per your suggestion does indeed work.

However, setting custom voices still dont work. Test this by specifying two male or two female voices in the config. The audio will still use the default voices with one man and one female. To actually get the voices i need I have to go into .venv\Lib\site-packages\podcastfy\conversation_config.yaml and change the voices manually which is not a good solution.

— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/132#issuecomment-2450670848, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3LWE7MF7CB6OVQNF4TZ6KAYPAVCNFSM6AAAAABQ6DHJHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJQGY3TAOBUHA . You are receiving this because you modified the open/close state.Message ID: @.***>

ronnyskog commented 3 weeks ago
    "openai": {
        "default_voices": {
            "question": "alloy",
            "answer": "echo",
        },
    },

Console error:

Traceback (most recent call last): File "d:\Data\Software\Python\Podcastify\app.py", line 55, in my_podcast = generate_podcast(urls=urls, conversation_config=podcast_config, tts_model='openai') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Data\Software\Python\Podcastify.venv\Lib\site-packages\podcastfy\client.py", line 307, in generate_podcast return process_content( ^^^^^^^^^^^^^^^^ File "D:\Data\Software\Python\Podcastify.venv\Lib\site-packages\podcastfy\client.py", line 110, in process_content text_to_speech.convert_to_speech(qa_content, audio_file) File "D:\Data\Software\Python\Podcastify.venv\Lib\site-packages\podcastfy\text_to_speech.py", line 95, in convert_to_speech audio_segments = self._generate_audio_segments(cleaned_text, temp_dir) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Data\Software\Python\Podcastify.venv\Lib\site-packages\podcastfy\text_to_speech.py", line 117, in _generate_audio_segments audio_data = self.provider.generate_audio(content, voice, model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\Data\Software\Python\Podcastify.venv\Lib\site-packages\podcastfy\tts\providers\openai.py", line 41, in generate_audio raise RuntimeError(f"Failed to generate audio: {str(e)}") from e RuntimeError: Failed to generate audio: Error code: 400 - {'error': {'message': '[{\'type\': \'enum\', \'loc\': (\'body\', \'voice\'), \'msg\': "Input should be \'nova\', \'shimmer\', \'echo\', \'onyx\', \'fable\' or \'alloy\'", \'input\': \'Alloy\', \'ctx\': {\'expected\': "\'nova\', \'shimmer\', \'echo\', \'onyx\', \'fable\' or \'alloy\'"}}]', 'type': 'invalid_request_error', 'param': None, 'code': None}}