souzatharsis / podcastfy

An Open Source Python alternative to NotebookLM's podcast feature: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with GenAI
https://www.podcastfy.ai
Apache License 2.0
959 stars 100 forks source link

audio files not working with local model #84

Open tkanngiesser opened 2 weeks ago

tkanngiesser commented 2 weeks ago

Hi, everything's working fine when using Gemini.

The issue is with using a local model

I'm running ./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser (which is working) ffmpeg is properly installed and can be found in the path (+ all needed permissions)

from podcastfy.client import generate_podcast

from IPython.display import Audio, display

def embed_audio(audio_file):
    """
    Embeds an audio file in the notebook, making it playable.

    Args:
        audio_file (str): Path to the audio file.
    """
    try:
        display(Audio(audio_file))
        print(f"Audio player embedded for: {audio_file}")
    except Exception as e:
        print(f"Error embedding audio: {str(e)}")

audio_file = generate_podcast(
    urls=["https://en.wikipedia.org/wiki/Podcast"],
    tts_model="edge", 
    is_local=True
)

however when running, it only (and always independent of text source) produces 260B mp3 which are not playable

Any suggestions would be much appreciated. Thank you!

souzatharsis commented 2 weeks ago

Thanks for sharing a reproducible example.

The local llm might be generating a malformed transcript. Podcastfy assumes that there are two actors with their transcripts enclosed in tags in alternating flashing.

.... ...

...

Gemini, the base model, returns that successfully but that may not be the case for other less capable local models.

Please try to generate - - trancript_only to verify the transcript without audio generation and revert back please.

On Sat, Oct 19, 2024, 5:08 AM Tino Kanngiesser @.***> wrote:

I have a very small markdown text for testing purposes. I'm running ./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser (which is working) ffmpeg is properly installed and can be found in the path (+ all needed permissions)

the code is straightforward and shown as in the example code

from podcastfy.client import generate_podcast

from IPython.display import Audio, display

def embed_audio(audio_file): """ Embeds an audio file in the notebook, making it playable.

Args: audio_file (str): Path to the audio file. """ try: display(Audio(audio_file)) print(f"Audio player embedded for: {audio_file}") except Exception as e: print(f"Error embedding audio: {str(e)}")

audio_file = generate_podcast( urls=["https://en.wikipedia.org/wiki/Podcast"], tts_model="edge", is_local=True )

however when running, it only (and always independent of text source) produces 260B mp3 which are not playable

Any suggestions would be much appreciated. Thank you!

— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/84, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3OAS475SYYFMFRI6J3Z4IAQNAVCNFSM6AAAAABQHGY7BCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4TQOBTHA2DSNY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

tkanngiesser commented 2 weeks ago

thanks for the quick reply

audio_file_local = generate_podcast(
    urls=["https://allendowney.blogspot.com/2011/05/there-is-only-one-test.html"],
    tts_model="edge", is_local=True,
    transcript_only=True
)

returns None

Are there any local models you had a chance to test with which would work and which you would suggest to start with ? Many thanks!

souzatharsis commented 2 weeks ago

Please check you data/transcript folder.

Yes, I tested it exactly with the same local modal you are using.

On Sat, Oct 19, 2024, 5:58 AM Tino Kanngiesser @.***> wrote:

thanks for the quick reply

audio_file_local = generate_podcast( urls=["https://allendowney.blogspot.com/2011/05/there-is-only-one-test.html"], tts_model="edge", is_local=True, transcript_only=True )

returns None

Are there any local models you had a chance to test with which would work and which you would suggest to start with ? Many thanks!

— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/84#issuecomment-2423670682, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3NRXFTRI37LGQFW2FLZ4IGLDAVCNFSM6AAAAABQHGY7BCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTGY3TANRYGI . You are receiving this because you commented.Message ID: @.***>

tkanngiesser commented 2 weeks ago

the produced transcript unfortunately only contains </s>

souzatharsis commented 2 weeks ago

What's podcastfy version are using? Please make sure it's the latest v0.2.6

On Sat, Oct 19, 2024, 6:05 AM Tino Kanngiesser @.***> wrote:

the produced transcript unfortunately only contains

— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/84#issuecomment-2423678402, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3KN7HHAM4QOX3STISTZ4IHEDAVCNFSM6AAAAABQHGY7BCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTGY3TQNBQGI . You are receiving this because you commented.Message ID: @.***>

tkanngiesser commented 2 weeks ago

yes it's v.0.2.6,

I'm wondering - does it have to be a chat model or could it be an instruct one as well ?

there are a few more listed here for example https://github.com/Mozilla-Ocho/llamafile

souzatharsis commented 2 weeks ago

I tested with the aforementioned model and it worked. There is a sample transcript from it in the repo data/transcript (that's the most recent one).

I'm on a vacation trip now but will check this out once back on Thursday.

On Sat, Oct 19, 2024, 6:28 AM Tino Kanngiesser @.***> wrote:

yes it's v.0.2.6,

I'm wondering - does it have to be a chat model or could it be an instruct one as well ?

there are a few more listed here for example https://github.com/Mozilla-Ocho/llamafile

— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/84#issuecomment-2423691619, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3MFCHORVH7XOKV4G5LZ4IJ2HAVCNFSM6AAAAABQHGY7BCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTGY4TCNRRHE . You are receiving this because you commented.Message ID: @.***>

westpole05 commented 2 weeks ago

Hello there,

Same issue here. After reading this conversation between @tkanngiesser and @souzatharsis, I tried to use this transcript: https://github.com/souzatharsis/podcastfy/blob/main/data/transcripts/transcript_local_model.txt

python.exe -m podcastfy.client --transcript transcript_local_model.txt --tts-model edge

This also returns a 260B mp3 which is not playable. It seems there is another issue at play there maybe?

souzatharsis commented 2 weeks ago

This transcript was generated form a tiny llm and it was malformed hence not able to generate audio.

Gemini works well.

Other local modals are not guaranteed to work.

On Sat, Oct 19, 2024, 8:31 AM westpole05 @.***> wrote:

Hello there,

Same issue here. After reading this conversation between @tkanngiesser https://github.com/tkanngiesser and @souzatharsis https://github.com/souzatharsis, I tried to use this transcript: https://github.com/souzatharsis/podcastfy/blob/main/data/transcripts/transcript_local_model.txt

python.exe -m podcastfy.client --transcript transcript_local_model.txt --tts-model edge

This also returns a 260B mp3 which is not playable. It seems there is another issue at play there maybe?

— Reply to this email directly, view it on GitHub https://github.com/souzatharsis/podcastfy/issues/84#issuecomment-2423747987, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADTMY3KJAAYJNQD2UDNX5PDZ4IYIJAVCNFSM6AAAAABQHGY7BCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRTG42DOOJYG4 . You are receiving this because you were mentioned.Message ID: @.***>