valentinfrlch / ha-llmvision

Let Home Assistant see!
Apache License 2.0
150 stars 5 forks source link

Export snapshot as variable (workaround to ring-mqtt) #75

Open helicalchris opened 5 days ago

helicalchris commented 5 days ago

Is your feature request related to a problem? Please describe. TL;DR not really a problem, more how ring-mqtt works, but this addon could help with these problems.

As per the discussions in my recent issue ring-mqtt doesn't present cameras quite how this addon would prefer. The video is an rtsp but the entity_image, and snapshot images, are not real time. They are periodic images taken by the ring platform and not linked to real time viewing or motion events. This means that streaming support in this addon doesn't work with ring-mqtt. Ce la vie.

However it also means that you can't attach a snapshot from the analysed video in any notifications. The only place that has that video is this plugin and the file it reads from (ie it isn't in any HA entity)

Describe the solution you'd like To return one of the analysed frames (first probably) optionally in the response variable or another variable

Describe alternatives you've considered Otherwise I need to find a way to take a frame from the mp4 video file itself, but I can't easily see how in an HA supportable way

Additional context This is of course all due to ring-mqtt and ring itself, and how it works, as it's not a streaming cctv service and doesn't function like Frigate etc. Might be worth a note on this in the docs as it's not immediately obvious (and also you need to know that streaming support doesn't take a streaming video it takes snapshots from entity_image which also isn't immediately obvious)

valentinfrlch commented 4 days ago

TL;DR Possible solution for root cause attached below (thread), blueprint inputs need to be reworked.

I agree. Judging by people on the forum it is too complicated. The two modes on the blueprint have different methods of showing the preview which is not ideal.

Probably the best way to fix is this is to store the most recent preview for each camera entity in the www folder. This limits the number of images stored to the amount of cameras. The downside of course is that only the most recent snapshot for each camera will be available. (For notifications I'm not entirely sure but it's possible that the image is cached on the phone once the notification is received. This way even older notifications could have a preview...)

This would only show a static snapshot, but I guess that's better than no snapshot at all. Unfortunately even this way the two modes of the blueprint would still need different methods as previewing frigate's event from the notification is very useful.

I also agree that it would be better if stream_analyzer directly tapped into the rtsp feed. This is another compatibility problem as the rtsp address for most camera entities is not exposed. The entity_picture attribute is present on all camera entities (at least to my knowledge) but may not update as frequently.

Btw: I found this thread about configuring a generic camera for live view of ring-mqtt. Maybe you've already seen that, but in case not: https://community.home-assistant.io/t/configuring-generic-camera-for-live-view-of-ring-mqtt/730270/2

In case the solution in the thread actually works, this would actually solve this problem. The blueprint will still need to be simplified.

helicalchris commented 4 days ago

Thanks. I've got real time working on generic camera already (that way) but as per original post the real time and snapshots are not linked for ring based implementations. Snapshots are every x minutes (configurable >=1 from memory) regardless of activity.

There is no way to extract a frame from live video rtsp I have found hence it not being possible to get snapshots to use in notifications. All the snapshots (saved using actions or from entity_picture or from still frame (all the same image) are from the periodic capture

valentinfrlch commented 4 days ago

I think I finally understand the problem. So correct me if I'm wrong but your camera (the generic camera) works and you get a preview in the notification (but it's the live view) and you'd prefer the snapshot of the actual event. Correct me if that's not the case.

helicalchris commented 4 days ago

Yes spot on. Logic was that as you are extracting frames anyway, could you return one of those extracted frames for use as a variable.

There is a wider sense to this in that the AI request only relates to a few images from a stream. Therefore as context it's useful to see what the AI engine has seen. The variable response could even include all the images sent (maybe in a debug mode). That would have been useful as while ago I could have kept them and rerun the prompt with the same images a few times to tune it rather than different images every time.

valentinfrlch commented 4 days ago

Ah great to hear that it's at least working somewhat! The latest version added a fairly complex function to decide which images are important and which are not (based on how much the image changed to a baseline). Still I understand you want the images. What could be added is an addional key in the response variable which holds a list of base64 encoded images. The problem with this approach is probably decoding them into an image. What do you think of a 'debug mode' that stores the analyzed frames in a folder?

helicalchris commented 4 days ago

It's working perfectly in terms of taking the saved video clip and sending that for analysis. So, in so far as the addon purpose, it's working perfectly πŸ˜€

I think that writing images to a folder would be fine actually. Probably best have an option to choose whether it always appends date-time or not (I would have it overwrite the same files each time - otherwise that will get big fast - others may prefer to be able to look back)

helicalchris commented 4 days ago

Also happy to write some text for the docs to explain how ring_mqtt works and how to use the addon with it, if that's helpful, which can call out this debug feature too?

valentinfrlch commented 3 days ago

Some explainer how to get ring cameras working with this integration would be greatly appreciated! I will add a boolean parameter expose_images to the action call for all analyzers. In the blueprint there will be a dropdown to choose between 'Live Preview' and 'Snapshot' (this will only apply to camera mode). When 'Snapshot' is selected it would then set 'expose_images' in the action call and return the first image from the series.

helicalchris commented 3 days ago

I think that's perfect.

If I put a write up of Ring on the HA community thread is that easiest or shall I PM it to you?

helicalchris commented 3 days ago

(I assume you mean by camera mode it's in video analyser or stream analyser mode?)

valentinfrlch commented 3 days ago

If I put a write up of Ring on the HA community thread is that easiest or shall I PM it to you?

The HA community thread would be great! That way more people see it. I'd also put this in the docs if that's ok with you.

Thanks for contributing!

valentinfrlch commented 3 days ago

(I assume you mean by camera mode it's in video analyser or stream analyser mode?)

When the blueprint is in Frigate mode then video_analyzer will be called, Camera mode uses stream_analyzer

helicalchris commented 3 days ago

Got you. Not that it affects me but would people want to see an example image in frigate mode maybe to work out what is causing the prompt to respond as it is?

(Re ring noted, Of course copy my text into the docs, not a problem)

valentinfrlch commented 2 days ago

Thank you for writing such an elaborate guide. Given the popularity of ring I'm sure a lot of people will find this useful.

v1.3 is almost ready I just need to fix a few small things. If you have some time and want to test the beta, feedback would be very welcome.

helicalchris commented 2 days ago

Thank you πŸ˜ƒ

Very happy to test the beta

valentinfrlch commented 2 days ago

The beta is out! You can find the changelog here: https://github.com/valentinfrlch/ha-llmvision/releases/tag/v1.3-beta.1

To update the blueprint to the v1.3 beta you can use this url to import it: https://github.com/valentinfrlch/ha-llmvision/blob/semantic-index/blueprints/event_summary.yaml

Thanks for taking the time!

helicalchris commented 1 day ago

Beta installed...however I am away from home so I need a postman or courier to walk past it to trigger (otherwise the image is always the same and I can't see if I'm getting a frame from the video or an old artefact!)

helicalchris commented 1 day ago

This might be a me problem but when I run the automation step there is an error that:


Stopped because an error was encountered at 29 October 2024 at 19:35:44 (runtime: 11.03 seconds)

No image input provided

I write my video file to /share/front-camera-clip.mp4 (and I can load that file and play it myself and it works), and it was working fine until recently.

I don't use the blueprint so not sure if that's an issue (but the new options are showing in the automation - I created a new one from scratch to be sure). Also I think this is actually an issue from the last version as I have been away and haven't been using it much but did see a trace from before I changed to 1.3 beta (I just ignored it thinking it was an anomaly then ... ring is fickle) with the same issue I think (and now I can't look back as there seems to be nothing in logs about this)

Happy to post full automation yaml but it's a bit long!

valentinfrlch commented 1 day ago

Thanks for the feedback! Not sure about your folder structure but can you try /config/share/front-camera-clip.mp4? I have some test images in /www/tmp but it only works when I add the path as /config/www/tmp

helicalchris commented 1 day ago

I will do, but access to folders like share are permitted via configuration.yaml entries (see my ring guide for link). That shouldn't be an issue and it wasn't in 1.x versions of the addon where it worked perfectly

Note the real path is /share on the box. It's not under /config (like /www is) so I will need to make a new folder called share within config to do this

valentinfrlch commented 1 day ago

Ah I see... Do you have any images in the www folder so you could test if that works? iirc there have been no changes to the way local files are handled.

helicalchris commented 1 day ago

I suppose I could put any image in there it doesn't matter for the test purposes so yes I'll give that a go

I can view the video on the share so it shouldn't be permissions

helicalchris commented 1 day ago

It works in image mode if I point it at an image in /share


action: llmvision.image_analyzer
data:
  provider: (redacted not sure if this is secret or a uid!)
  message: tell me what you see
  image_file: /share/front-snapshot.jpg
  include_filename: false
  max_tokens: 86
  temperature: 0.4

It fails in video mode if I point it at a video in /share


action: llmvision.video_analyzer
data:
  max_frames: 3
  include_filename: false
  target_width: 1280
  detail: low
  max_tokens: 100
  temperature: 0.2
  provider: (redacted not sure if this is secret or a uid!)
  message: tell me what you see
  video_file: /share/front-camera-clip.mp4
valentinfrlch commented 20 hours ago

That's pretty strange but thanks a lot for testing. I will look into it! Are there any errors/warnings in the log? Can you enable debug logging? That would help a lot!

You can enable debug logging by adding this to your configuration.yaml:

logger:
  logs:
    custom_components.llmvision: debug
helicalchris commented 19 hours ago

Sorry I should have posted the logs. When I said earlier there are none I meant none except the one the automation tells me (as per above) but the log does give line numbers etc


Logger: homeassistant.components.automation.gemini_video_analyse
Source: components/automation/__init__.py:763
integration: Automation (documentation, issues)
First occurred: 20:53:58 (1 occurrences)
Last logged: 20:53:58

Error while executing automation automation.gemini_video_analyse: No image input provided

Remember as per above I believe this was also happening with 1.21. Does rollback work with HACS properly I've never tried? I could revert and test