Open Lanceeeelot opened 4 months ago
If you use the default setup using Cineast as your retrieval engine (assumption based on your previous issue), then I am not aware of such a feature already implemented. Do you have a specific model / your own model in mind?
Generally speaking, I'd suggest to introduce a new textual input field, similar to text-embedding / OCR / ASR: You would add another category name - display label tuple:
would end up in something similar to:
text: {
categories: [['visualtextcoembedding', 'Description (VTE)'], ['ocr', 'OCR'],['text-to-audio-category', 'Sound']]
},
with text-to-audio-category
being a registered category to cineast.
How can you search for a video scene with a specific sound through the interface if this sound is described in a prompt, such as "explosion" or "traffic noise"?