puntorigen / podcast_tts

A class for generating realistic audio (TTS) for podcasts and dialogues.
MIT License
9 stars 0 forks source link
podcast python tts

Podcast TTS

podcast_tts is a Python library for generating podcasts and dialogues using text-to-speech (TTS). It supports multiple speakers, background music, and precise audio mixing for professional-quality results.

Example Podcast

You can listen to the example podcast below:

https://github.com/user-attachments/assets/baf6aa80-2d8f-4a2c-8159-efa9d9596693

Features

Installation

pip install podcast_tts

Usage

Generating Audio for a Single Speaker

import asyncio
from podcast_tts import PodcastTTS

async def main():
    tts = PodcastTTS(speed=5)
    await tts.generate_tts(
        text="Hello! Welcome to our podcast.",
        speaker="male1",
        filename="output_audio.wav",
        channel="both"
    )

if __name__ == "__main__":
    asyncio.run(main())

Example: Generating a Podcast with Music

The generate_podcast method combines dialogue and background music for a seamless podcast production.

import asyncio
from podcast_tts import PodcastTTS

async def main():
    tts = PodcastTTS(speed=5)

    # Define speakers and text
    texts = [
        {"male1": ["Welcome to the podcast!", "both"]},
        {"female2": ["Today, we discuss AI advancements.", "left"]},
        {"male2": ["Don't miss our exciting updates.", "right"]},
    ]

    # Define background music (local file or URL)
    music_config = ["https://example.com/background_music.mp3", 10, 3, 0.3]

    # Generate the podcast
    output_file = await tts.generate_podcast(
        texts=texts,
        music=music_config,
        filename="podcast_with_music.mp3",
        pause_duration=0.5,
        normalize=True
    )

    print(f"Podcast saved to: {output_file}")

if __name__ == "__main__":
    asyncio.run(main())

Music Configuration:

Premade Voices

PodcastTTS includes the following premade speaker profiles:

These profiles are included in the package's default_voices directory and can be used without additional setup.

Dynamic Speaker Generation

When a speaker profile is specified but does not exist, the library will automatically generate a new speaker profile and save it in the voices subfolder. This ensures consistent voice roles across different turns in a dialogue. For example:

texts = [
    {"Narrator": ["Welcome to this exciting episode.", "left"]},
    {"Expert": ["Today, we'll explore AI's impact on healthcare.", "right"]},
]
# If "Narrator" or "Expert" profiles do not exist, they will be generated dynamically.

The profiles are saved in the script's voices directory and reused automatically if the same speaker is used in the future for consistency.

Loading Existing Speaker Profiles

You can load any speaker profile by specifying its filename (without the .txt extension). Profiles are stored in the voices subfolder, so you don't need to specify the path explicitly.

# Assuming a speaker profile "Host.txt" exists in the voices subfolder
await tts.generate_tts("This is a test for an existing speaker.", "Host", "existing_speaker.wav")

Additional Notes

Contributing

Contributions are welcome! Feel free to submit issues or pull requests on the GitHub repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.