home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.63k stars 29.93k forks source link

camera.record service causes exception #75373

Closed Gollam closed 2 years ago

Gollam commented 2 years ago

The problem

A lot of my mp4 recordings become corrupt with the latest update. The only error I can see is these exceptions in the log, I guess they are connected to the issue.

What version of Home Assistant Core has the issue?

core-2022.7.5

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant Container

Integration causing the issue

stream

Link to integration documentation on our website

https://www.home-assistant.io/integrations/stream/

Diagnostics information

No response

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2022-07-17 20:32:16 ERROR (MainThread) [homeassistant.core] Error executing service: <ServiceCall camera.record (c:01G86PWSWGBYSS12PPADZ5JMMZ): device_id=['c98d33e1ba0bce82a35d9da14c45e42e'], duration=15, lookback=2, filename=Template("/media/record1.mp4")>
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/core.py", line 1731, in catch_exceptions
    await coro_or_task
  File "/usr/src/homeassistant/homeassistant/core.py", line 1750, in _execute_service
    await cast(Callable[[ServiceCall], Awaitable[None]], handler.job.target)(
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 204, in handle_service
    await service.entity_service_call(
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 680, in entity_service_call
    future.result()  # pop exception if have
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 930, in async_request_call
    await coro
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 717, in _handle_entity_call
    await result
  File "/usr/src/homeassistant/homeassistant/components/camera/__init__.py", line 1017, in async_handle_record_service
    await stream.async_record(
  File "/usr/src/homeassistant/homeassistant/components/stream/__init__.py", line 514, in async_record
    await recorder.async_record()
  File "/usr/src/homeassistant/homeassistant/components/stream/recorder.py", line 166, in async_record
    os.rename(self.video_path + ".tmp", self.video_path)
FileNotFoundError: [Errno 2] No such file or directory: '/media/record1.mp4.tmp' -> '/media/record1.mp4'

Additional information

No response

probot-home-assistant[bot] commented 2 years ago

stream documentation stream source (message by IssueLinks)

probot-home-assistant[bot] commented 2 years ago

Hey there @hunterjm, @uvjustin, @allenporter, mind taking a look at this issue as it has been labeled with an integration (stream) you are listed as a code owner for? Thanks! (message by CodeOwnersMention)

uvjustin commented 2 years ago

Can you share more details about your setup? What type of cameras do you have? Was this working in 2022.6.x? Do the cameras work when you are viewing them live?

Gollam commented 2 years ago

Sure. I have 3 DS-2CD1043G0-I cameras. I haven't upgraded HA in a while, I skipped 2022.6.x, last version I used was around 2022.5.x I think, which was fine. Shall I try downgrading, or is that not a good idea? Live view is working perfectly.

uvjustin commented 2 years ago

No need to downgrade. Are there any special/odd permissions on that folder? Is the directory executable? The record service changed behind the scenes, and it now writes to a .tmp file while in progress before moving it to the .mp4 file. For some reason the .tmp is either not being written or not being found.

Gollam commented 2 years ago

No special permissions. And permissions shouldn't be the problem because 80% of the files get renamed successfully from tmp to mp4. The other ~20% is when this error happens, no tmp file remains in the folder, and no mp4 gets created either.

And about half of the files are now corrupted. VLC can play them with some error correction, but many other players cannot. The image is falling apart, green lines towards the bottom, etc. I don't know if this is related to the record service refactoring, but I guess it is. I changed nothing else in my system, I only upgraded HA and this started happening.

Also, it seems like now it does not matter that preload stream is enabled for my camera streams, it does not get preloaded. As you may see in the logs, I record with the following parameters: duration=15, lookback=2. Previously, when it was working "well", I got around 30 secs of lookback, previous to the event happening, and 15 sec after the event, so in total, a 45 second video. Now, I get only the 15 sec video, which starts after the event has happened, so it seems lookback has no effect anymore, or preload is not working correctly.

I don't know if I should open separate issues for all these or not, I just wrote all my observations in this comment.

uvjustin commented 2 years ago

Hmm, I think I might have an idea what might be causing this. I'll prepare a PR to see if it fixes the issue.

uvjustin commented 2 years ago

Actually, I had a look, and what I thought might have been the issue is not an issue. This is not necessarily due to the record service refactoring. Many other changes happened in 2022.7. Stream moved to ffmpeg 5, and the default HA container moved to alpine 3.16 and python 3.10. As for lookback, a lookback of 2 should not generally give 30 seconds of lookback. Previously there was a bug that miscalculated lookback by 1 segment, but even then I'm not sure how it was giving 30 seconds. If you want 30 seconds of lookback, you should use a lookback value of 30.

Gollam commented 2 years ago

Sure, but I'm not receiving even 2 seconds of lookback. I'll of course try with other values. And are you saying with the ffmpeg and other changes, that we cannot really debug further the corrupted video files issue?

uvjustin commented 2 years ago

As noted in the documentation, the duration and lookback are not guaranteed to be exact and are just suggestions. I saw that previously you were using AppDaemon. This is the first time I've seen AppDaemon, and I don't know anything about it. Does this problem occur without using AppDaemon?

Gollam commented 2 years ago

Thanks for that info, I may have missed it in the docs.

AppDaemon uses some kind of HA API via a long-lived access token. It allows you to write automations in pure Python. Basically it can access sensor states, and all the other stuff which a normal HA automation would allow you to see. It is much easier for me, and also more powerful to write automations in 6 lines of code, than in the GUI interface of HA, or in the yaml format.

This recording automation has been working for a few months now via AppDaemon, without issues. But of course, I will test it using the build in HA automations to see if it makes any difference.

Gollam commented 2 years ago

I tried invoking the service directly from the HA Developer tools menu, on the Services tab.

2022-07-19 10:16:49 ERROR (MainThread) [homeassistant.core] Error executing service: <ServiceCall camera.record (c:01G8ARFBD5EMZKADXE5F9T5X12): filename=Template("/media/asdf.mp4"), duration=15, lookback=10, device_id=['b1fe32f93c7d04242a8deab9e1b811ee']>
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/core.py", line 1731, in catch_exceptions
    await coro_or_task
  File "/usr/src/homeassistant/homeassistant/core.py", line 1750, in _execute_service
    await cast(Callable[[ServiceCall], Awaitable[None]], handler.job.target)(
  File "/usr/src/homeassistant/homeassistant/helpers/entity_component.py", line 204, in handle_service
    await service.entity_service_call(
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 680, in entity_service_call
    future.result()  # pop exception if have
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 930, in async_request_call
    await coro
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 717, in _handle_entity_call
    await result
  File "/usr/src/homeassistant/homeassistant/components/camera/__init__.py", line 1017, in async_handle_record_service
    await stream.async_record(
  File "/usr/src/homeassistant/homeassistant/components/stream/__init__.py", line 514, in async_record
    await recorder.async_record()
  File "/usr/src/homeassistant/homeassistant/components/stream/recorder.py", line 166, in async_record
    os.rename(self.video_path + ".tmp", self.video_path)
FileNotFoundError: [Errno 2] No such file or directory: '/media/asdf.mp4.tmp' -> '/media/asdf.mp4'

I got the error again, but interestingly, the video file is not corrupt, and the mp4 was there. I did a second run with another filename, I received the same error, and the mp4 was not created. So sometimes it works, sometimes it doesn't.

Gollam commented 2 years ago

OK, we've made some progress. I think my sftp connection fooled me, all the newly created mp4 files are there, none of them are missing. The files are still corrupt, but the important thing is the python exception here.

So this is what's happening:

I call the service from developer tools like this:

service: camera.record
data:
  filename: /media/asdf3.mp4
  duration: 15
  lookback: 10
target:
  device_id: b1fe32f33d7d0424ba8deac9e5b831ee

After invoking this, the recording is done, the mp4 file is created, then the exception comes. For every single file, there is an exception in the log, but no mp4 files are missing, all of them are there in the folder.

uvjustin commented 2 years ago

I'm sorry but I can't reproduce this and I also can't see how the code would produce that behavior. It almost seems as if you are running two asyncio loops. Additionally, since you're running a container, the system should be set up well, so that shouldn't be a problem. Are there any custom components or other customizations on your system besides appdaemon?

Gollam commented 2 years ago

No more custom components or anything else is installed. And keep in mind that appdaemon is only there to invoke the camera.record service. The service invoked from the Developer Tools also produces the exception, even with appdaemon shut down. Every time I press the call service button here, the exception is thrown:

image

uvjustin commented 2 years ago

It seems to me that for some reason your service is getting called twice simultaneously. We don't use a lock in the service, but there's a section that checks if the recorder is already working. However, it's possible for two simultaneous calls to both get past that section. Is there any other service you can call to see if it also gets called twice?

Gollam commented 2 years ago

Sure, for example if I call the "telegram bot send message" service, that only sends the message once on my telegram channel.

uvjustin commented 2 years ago

Hmm actually...do you have an rsync service or similar running on that directory? Something like that could be reading the tmp file while it is still being written which could lead to corruption.

Gollam commented 2 years ago

No, nothing like that.

Gollam commented 2 years ago

I have a mainstream and a substream configured for each cam. Can it be that both of the streams start recording? I'm just guessing now, that may cause 2 simultaneous calls.

Gollam commented 2 years ago

OK, this is the root cause for the exception, only one of the streams can be enabled at a time for the camera.record to work properly, otherwise I guess it will launch 2 recordings, which write into the same file. I have a feeling this is responsible for the corrupted recordings as well, but I'm disabling all substreams now for all cams, and testing that. For this service, I can only give a camera device ID which I want to record from, and cannot determine which stream I'd like to record. Can't that be improved in some way?

uvjustin commented 2 years ago

I don't quite understand. Usually the streams are made available in HA as separate entities. What camera integration are you using?

Gollam commented 2 years ago

This is the official ONVIF integration. As you can see in my previous code example, in the target field, I provide a device ID, and I did not select the stream entity. Now I rewritten this part of the code to choose the correct stream, and it works! In the past, I was only able to use the device ID, I cannot remember the exact reason, but I couldn't make it work in any other way. Thanks a lot for your help! Now everything is fine for me.

But also, this can be considered as a minor bug, if someone provides a device ID as a target, and the device has multiple streams available, that will result in corrupt recordings and exception in the log.

uvjustin commented 2 years ago

@allenporter Do you think this is worth guarding against? Since it would be on different streams, an implementation would involve some kind of class level synchronizer. I'm inclined to just warn against this usage in the docs.

allenporter commented 2 years ago

@uvjustin yeah, I agree. Even if you did protect against it, it still would be unexpected (two streams writing to the same file serially). I noticed the example includes the entity id kind of showing the point that there is only so much you can do here to ensure uniqueness:

filename: '/tmp/{{ entity_id.name }}_{{ now().strftime("%Y%m%d-%H%M%S") }}.mp4'

So, yeah, a warning in the documentation seems like a good idea.

You could also consider:

uvjustin commented 2 years ago

There's already debug logging about the recording starting. It seems like info is generally discouraged? https://github.com/home-assistant/core/pull/73107#discussion_r891277767 I'll add that error message in there and then put a blurb in the docs.

uvjustin commented 2 years ago
filename: '/tmp/{{ entity_id.name }}_{{ now().strftime("%Y%m%d-%H%M%S") }}.mp4'

Turns out templates in the service don't work. See #40241.