Closed timtensor closed 5 months ago
Hello, firstly, please be aware that it may take up to 10 minutes for a newly created Groq API key to become active. Anyway, make sure to select "groq" as the API endpoint. If you're using a YouTube video, the process involves:
Regarding your second question about non English language video:
I hope this clarifies your doubts.
thank you for the information. I will give it a try . Are google transcription better or better with whisper , i guess you are not sure of it right ?
But thanks a lot for explaining. I will try it out to get a gist of the summaries. edit : it seems to work well on transcription. Even though the transcription was in german , it automatically translated into english.
About whisper model for transcribing , could this be a good alternative ? https://github.com/huggingface/distil-whisper
No problem! You're welcome :)
Regarding Whisper vs youtube auto gen captions: Whisper is actually better for transcription. If you look at Whisper documentation, you’ll see that you can use more accurate variant like "large" or "medium" by simply changing a word in this python notebook. Just keep in mind that more accuracy = slower .
The alternative you're considering seems valid, and I think you can use them without changing too much code. At the start of this project, I was using "faster-whisper," which gives a lot of performance boost. However, I had to discard it because it's not compatible with the most recent CUDA API, which I have installed locally on my desktop.
@martinopiaggi - sorry to ask again but i had another question. there are podcast(s) on youtube , for which this summarization is a very good use case. Some podcasts have show notes , or chapters listed out . Wouldnt that be possible as well to get like chapter like summaries. I am wondering if it is already taken care like that or i think currently it is chunking with size 4096
Currently, it's chunking at 4096, yes. Honestly, I'm satisfied with this approach, which is general: it works with any kind of video. Your use case is specific (if I correctly understood, you want the final summary to follow the structure of the chapters that are in the video's description) and can present multiple challenges.
Yes in a nutshell yes . I havent experimented much so I am not sure if it is better . But considering it's like a logical stop between contexts it could be helpful .
I am curious about this chunking ,so basically are you making a small RAG system ? If so this concept or your implementation could be used for Q/ A systems right ?
Hi , what exactly are the steps for using the notebook for summarization ? I did have a valid Groq api key but it showed error message as invalid api key.
Also if we use transcription and the video is a non english language video , what should be modified ?