pharmapsychotic / clip-interrogator

Image to prompt with BLIP and CLIP
MIT License
2.71k stars 431 forks source link

Feature request: Averaging of multiple image embeddings before processing #118

Open Seedmanc opened 2 months ago

Seedmanc commented 2 months ago

Would accept a folder of images (or a multi-file select in Gradio) and do the same as it does now but produce a single output that would generalize characteristics of images in the set by averaging the output of image_to_features() before sending them further for interrogation.

Use case: when preparing datasets for model training it would be convenient to have a prompt that describes all of them as a whole to test how well the trained model reproduces the idea in general. At the same time it would be useful to have a general negative prompt to check out how the model handles out of scope prompts (and make sure it didn't lose the ability to generalize).

Does this idea make sense? Can you just average the image_to_features() output like regular numbers (or an array of them) or that won't work at all?