radicalxdev / kai-ai-backend

This is the Kai Teaching Assistant ai repo.
MIT License
12 stars 46 forks source link

Solve Issue #23 - Dynamo #34

Closed AaronSosaRamos closed 3 weeks ago

AaronSosaRamos commented 3 weeks ago

This PR is made for analyzing and implementing a new approach for solving Issue #23 related to Dynamo. When I started using the system, I tested this 9-minutes video: https://www.youtube.com/watch?v=rWcG-p1oQe0 and I received the Quota Exceeded error after retrying it for 5 attempts: image As a result, I tested another video that has a duration of 1 minute: https://www.youtube.com/watch?v=1aA1WGON49E, and it worked with the actual approach. For that reason, what I've done is to refactor the chunk_size and chunk_overlap for managing appropriately it in base of the length of the video, considering the threshold of 3 minutes for it: image Then, I've tested both videos and they worked appropriately. image I also used this 5-minutes video: https://www.youtube.com/watch?v=iDbdXTMnOmE, and it also worked. Indeed, I've tested it 4 times and the parser hasn't failed in any of those attempts. image What I've discovered is that the long latency is caused by the summarize_transcript method, specially if the video has an extense duration (9 to 10 minutes). For now, we can be working in this way. But I suggest to optimize the summarize_transcript method in base of how we manage load_summarize_chain because the latency is produced by this.

mikhailocampo commented 3 weeks ago

Was checking this out and found a similar issue with the Quota. On investigating this in the STAGING branch, I found the reason for it was because of the way the Map Reduce algorithm works

Essentially, it is sending a batch request on the N chunks of the transcript which actually produces the latency. This will require an entirely different approach actually where I think the way we can solve this is by a single large request rather than an iterative request due to rate limits

mikhailocampo commented 3 weeks ago

Due to that, I will not merge this for now into STAGING but rather make a new branch and try to solve this with using Gemini 1.5 Pro with a its 1M context window.

Since gemini-1.5-flash came out, we can try to experiment with that in as little requests as possible while not losing information in a two-layer process

SUMMARIZE -> EXTRACT