Multiple Videos Remembered/Sent with Video Analyzer

KenBrockman33 commented 3 months ago

I am experiencing a strange behavior specific to the video clips analyzed with OpenAI (I realize it may be an OpenAI issue, but wanted to share)

I have an automation that creates a short video clip (8 sec) and then uploads the video for analysis with the llmvision.video_analyzer service. What I am experiencing is that the response seems to include a summary of several of the past events, like there is a ghost of the previous videos. I get the same issue whether the service is run from an automation or Dev Tools.

A typical response is: response_text: >- The images show various views of a porch, with some frames featuring a delivery person and others showing the empty porch at different times of day.

This would suggest to me that some frames are current and others are "old." There was delivery person at one point today, but not when the service was last run (response above) and when I open the video files I only see the current 8 second clip. Could OpenAI be holding on to previous videos analyzed in the past 24 hours?

I do not see this same behavior with images, only videos.

Yaml Automation

action:

metadata: {} data: filename: /config/www/Front.png target: entity_id: camera.front_high_resolution_channel action: camera.snapshot
metadata: {} data: duration: 8 lookback: 0 filename: /config/www/Front.mp4 target: entity_id: camera.front_high_resolution_channel enabled: true action: camera.record
delay: hours: 0 minutes: 0 seconds: 10 milliseconds: 0 enabled: true
metadata: {} data: provider: OpenAI model: gpt-4o-mini interval: 2 target_width: 1280 max_tokens: 100 temperature: 0.5 video_file: /config/www/Front.mp4 message: describe what is happening in 1 simple sentence response_variable: response action: llmvision.video_analyzer enabled: true
metadata: {} data: message: "{{response.response_text}}" title: Person Detected action: notify.mobile_app mode: single

KenBrockman33 commented 3 months ago

To test further, I changed the camera and rerun the action, with both a different camera and a different filename. Below is the response I received, suggesting OpenAI must be remembering or analyzing multiple videos. The action was performed on a single video file from a single camera.

YAML

action: llmvision.video_analyzer data: provider: OpenAI model: gpt-4o-mini interval: 2 include_filename: false target_width: 1280 detail: high max_tokens: 100 temperature: 0.5 video_file: /config/www/Driveway.mp4 message: describe what is happening

RESPONSE

response_text: >- The frames from the video show two distinct locations and activities:

Driveway Scene: Several frames depict a driveway with a black pickup truck parked on the concrete. The surrounding area features a well-maintained lawn and a pathway leading to a house. The frames indicate different times, but the truck remains stationary, suggesting it is parked there for a while.
Porch Scene: Other frames show a porch area where a person, likely a delivery worker, is seen adjusting their

valentinfrlch commented 3 months ago

Thank you very much for bringing this up! There was an error in the cleanup so the tmp folder with the extracted frames would not be deleted. This will be fixed in the next release.

KenBrockman33 commented 3 months ago

Awesome! Thanks for you initial and continued work on this, its been a great addition to HA.

valentinfrlch commented 3 months ago

The fix is live. Thanks again for pointing it out!

valentinfrlch / ha-llmvision