vitrivr / vitrivr-ng

vitrivr NG is a web-based user interface for searching and browsing mixed multimedia collections. It uses cineast as a backend
MIT License
16 stars 23 forks source link

Which UI element allows searching for Audio Features? #167

Open Lanceeeelot opened 4 months ago

Lanceeeelot commented 4 months ago

How can you search for a video scene with a specific sound through the interface if this sound is described in a prompt, such as "explosion" or "traffic noise"?

sauterl commented 4 months ago

If you use the default setup using Cineast as your retrieval engine (assumption based on your previous issue), then I am not aware of such a feature already implemented. Do you have a specific model / your own model in mind?

Generally speaking, I'd suggest to introduce a new textual input field, similar to text-embedding / OCR / ASR: You would add another category name - display label tuple:

https://github.com/vitrivr/vitrivr-ng/blob/master/src/app/shared/model/config/config.model.ts#L107-L109

would end up in something similar to:

 text: {
        categories: [['visualtextcoembedding', 'Description (VTE)'], ['ocr', 'OCR'],['text-to-audio-category', 'Sound']]
      },

with text-to-audio-category being a registered category to cineast.