Additional Zero Shot Models

octimot / StoryToolkitAI

An editing tool that uses AI to transcribe, understand content and search for anything in your footage, integrated with ChatGPT and other AI models

GNU General Public License v3.0

644 stars 52 forks source link

Hey there!

I think Segment Anything / Grounding DINO are creating more restrictive embeddings due to their promptable nature (more focused training data). In other words, CLIP on its own allows you to search using more "obscure" language, while others might be restricted to more common words (car, sky, bird, face etc.)

We're preparing an update which also allows the use of GPT-Vision and LLaVA-like models that would allow you to ingest and prompt directly too.

But, I'll take a look at these too ASAP!

Cheers

octimot / StoryToolkitAI

Additional Zero Shot Models #171