Closed distbit0 closed 1 year ago
I don't think that reliance on an LLM is in the cards for this tool. I have a new version I'm fiddling with that uses other heuristics to coalesce the transcript, although I don't know if that will see the light of day.
Perhaps you could feed the transcript, including timestamps, to gpt4 and ask it what timestamps seem like they should be accompanied with a new frame from the video, to save on image storage costs. gpt4 would decide based on e.g. if the narrator refers to something in the video. ofc not perfect but may be sufficient to significantly save on image storage costs and make the article text more readable.