Way to differentiate images

valentinfrlch / ha-llmvision

Let Home Assistant see!

Apache License 2.0

119 stars 4 forks source link

Way to differentiate images #37

Closed Someguitarist closed 1 month ago

Someguitarist commented 1 month ago

Sorry, one more feature request but made it a separate. It's awesome that you can pipe multiple entities into the service at ones, however it's impossible to differentiate which image it's talking about.

For example, if I feed it the Backyard Camera and Living Room camera and ask 'Which picture contains dogs' it says 'The image on the right' or 'Image number 2'.

Ideally the entity name would be passed into the image somehow so the LLM could differentiate between the the images more clearly. That way you could just ask 'Where are my car keys' and have it say 'I see them in the Kitchen image, on the counter.'

I've tried through prompting to get it to mention the entity name but no luck.

valentinfrlch commented 1 month ago

This should actually be possible. There is an option include_filename which passes the entity name/filename. If you don't see this option you need to upgrade.

You may still need to prompt the model to mention the name.

Someguitarist commented 1 month ago

Hmm, so I've tried a few different prompt methods to get it to mention the location but I've only had it work once, and that was likely by accident. Do you think it would be possible to add a 'description' box to the entities that is submitted with the image, or is that not possible?

I'm picturing being able to add like ~4 camera entities with a text box for each where you could add something like 'This is the kitchen of the house' 'This is the back yard of the house' to each. I'm not sure if it would really help though.

valentinfrlch commented 1 month ago

There seems to be a problem with the formatting of the title. It is sent correctly but apparently ignored by the LLM, because the name doesn't make sense. For example I got: 'text': 'http://10.0.1.201:8123Front Door:' It should actually be just 'Front Door' which is the camera's entity_name.

I'll push a fix for this soon, thanks for bringing this to my attention!

Someguitarist commented 1 month ago

Oh awesome, thanks for looking into this! I'll bet that'll be just what I was looking for! Much appreciated!