martinopiaggi / summarize

Video transcript summarization from multiple sources (YouTube, Dropbox, Google Drive, local files) using multiple LLM endpoints (OpenAI, Groq, custom model).
https://colab.research.google.com/drive/16sLs1fJ7inP1wKw90zgk7Q_88N4sFU1v
74 stars 12 forks source link

Groq API exact steps for google colab notebook #2

Closed timtensor closed 5 months ago

timtensor commented 5 months ago

Hi , what exactly are the steps for using the notebook for summarization ? I did have a valid Groq api key but it showed error message as invalid api key.

Also if we use transcription and the video is a non english language video , what should be modified ?

martinopiaggi commented 5 months ago

Hello, firstly, please be aware that it may take up to 10 minutes for a newly created Groq API key to become active. Anyway, make sure to select "groq" as the API endpoint. If you're using a YouTube video, the process involves:

Regarding your second question about non English language video:

I hope this clarifies your doubts.

timtensor commented 5 months ago

thank you for the information. I will give it a try . Are google transcription better or better with whisper , i guess you are not sure of it right ?

But thanks a lot for explaining. I will try it out to get a gist of the summaries. edit : it seems to work well on transcription. Even though the transcription was in german , it automatically translated into english.

About whisper model for transcribing , could this be a good alternative ? https://github.com/huggingface/distil-whisper

martinopiaggi commented 5 months ago

No problem! You're welcome :)

Regarding Whisper vs youtube auto gen captions: Whisper is actually better for transcription. If you look at Whisper documentation, you’ll see that you can use more accurate variant like "large" or "medium" by simply changing a word in this python notebook. Just keep in mind that more accuracy = slower .

The alternative you're considering seems valid, and I think you can use them without changing too much code. At the start of this project, I was using "faster-whisper," which gives a lot of performance boost. However, I had to discard it because it's not compatible with the most recent CUDA API, which I have installed locally on my desktop.

timtensor commented 5 months ago

@martinopiaggi - sorry to ask again but i had another question. there are podcast(s) on youtube , for which this summarization is a very good use case. Some podcasts have show notes , or chapters listed out . Wouldnt that be possible as well to get like chapter like summaries. I am wondering if it is already taken care like that or i think currently it is chunking with size 4096

martinopiaggi commented 5 months ago

Currently, it's chunking at 4096, yes. Honestly, I'm satisfied with this approach, which is general: it works with any kind of video. Your use case is specific (if I correctly understood, you want the final summary to follow the structure of the chapters that are in the video's description) and can present multiple challenges.

timtensor commented 4 months ago

Yes in a nutshell yes . I havent experimented much so I am not sure if it is better . But considering it's like a logical stop between contexts it could be helpful .

I am curious about this chunking ,so basically are you making a small RAG system ? If so this concept or your implementation could be used for Q/ A systems right ?