home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
70.54k stars 29.46k forks source link

Errors with stream component and HA hangs up #22840

Closed bieniu closed 4 years ago

bieniu commented 5 years ago

Home Assistant release with the issue: 0.91.1

Last working Home Assistant release (if known): None

Operating environment (Hass.io/Docker/Windows/etc.): RPi 3B/Raspbian/Python 3.7.2/venv

Component/platform: stream, camera/generic

Description of problem: I use generic camera and stream component. When I tap on camera card in Lovelace on iPad (iOS app 1.5) I have errors in log, stream don't start and sometimes after that my HA hangs up. The frontend doesn't work, the light control automation doesn't start, systemctl shows that the HA service is active (running). I have to restart HA to using it. This is the first time HA has been hanging since I use it (2 years).

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

camera:
  - platform: generic
    name: 'Salon'
    still_image_url: http://<camera_ip>/cgi-bin/currentpic.cgi
    stream_source: rtsp://<camera_ip>:8554/unicast

stream:

Traceback (if applicable):

2019-04-06 13:02:41 ERROR (stream_worker) [libav.mpegts] H.264 bitstream malformed, no startcode found, use the video bitstream filter 'h264_mp4toannexb' to fix it ('-bsf:v h264_mp4toannexb' option with ffmpeg)

2019-04-06 13:04:53 ERROR (MainThread) [aiohttp.server] Error handling request
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/web_protocol.py", line 418, in start
    resp = await task
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/web_app.py", line 458, in _handle
    resp = await handler(request)
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/web_middlewares.py", line 119, in impl
    return await handler(request)
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/http/real_ip.py", line 33, in real_ip_middleware
    return await handler(request)
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/http/ban.py", line 68, in ban_middleware
    return await handler(request)
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/http/auth.py", line 216, in auth_middleware
    return await handler(request)
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/http/view.py", line 114, in handle
    result = await result
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/websocket_api/http.py", line 36, in get
    request.app['hass'], request).async_handle()
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/websocket_api/http.py", line 188, in async_handle
    await self._writer_task
  File "/srv/homeassistant/lib/python3.7/site-packages/homeassistant/components/websocket_api/http.py", line 63, in _writer
    await self.wsock.send_json(message, dumps=JSON_DUMP)
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/web_ws.py", line 296, in send_json
    await self.send_str(dumps(data), compress=compress)
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/web_ws.py", line 283, in send_str
    await self._writer.send(data, binary=False, compress=compress)
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/http_websocket.py", line 643, in send
    await self._send_frame(message, WSMsgType.TEXT, compress)
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/http_websocket.py", line 620, in _send_frame
    await self.protocol._drain_helper()
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/base_protocol.py", line 81, in _drain_helper
    await waiter
  File "/usr/local/lib/python3.7/asyncio/selector_events.py", line 801, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
ConnectionResetError: [Errno 104] Connection reset by peer

2019-04-06 13:04:53 ERROR (MainThread) [aiohttp.server] Unhandled exception
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/web_protocol.py", line 447, in start
    await resp.prepare(request)
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/web_response.py", line 353, in prepare
    return await self._start(request)
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/web_response.py", line 667, in _start
    return await super()._start(request)
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/web_response.py", line 410, in _start
    await writer.write_headers(status_line, headers)
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/http_writer.py", line 112, in write_headers
    self._write(buf)
  File "/srv/homeassistant/lib/python3.7/site-packages/aiohttp/http_writer.py", line 67, in _write
    raise ConnectionResetError('Cannot write to closing transport')
ConnectionResetError: Cannot write to closing transport
dshokouhi commented 5 years ago

I saw a freeze myself yesterday however I did not see any errors at all. Glad you got an error as I was trying to reproduce it. When my freeze happened it was under different circumstances. I had one of my amcrest cameras with preload stream and one of my blue iris generic cameras as well on preload. I noticed after about 12 hours that the amcrest camera was stuck on the same timestamp, the video was no longer updating no more how long I kept it open or attempted to view the stream in. I then unchecked preload stream on both cameras. After about 5 minutes when the keep alive time should've stopped. My HA froze and I could not control any lights. Systemctl also showed HA up and running but there were no errors in the logs at the time. If I can reproduce it and I get logs I will post them here myself. I am also on 0.91.1 with HA running in a venv on an i5.

CC: @hunterjm FYI

bieniu commented 5 years ago

My camera stream also stuck on the same timestamp few minutes before HA hangs.

cinquemic commented 5 years ago

Same here but only with dafang camera the other camera works fine. Noticed that the problem came out when HA try to stop rtsp server.

sendorm commented 5 years ago

Same issue here with both dafang camera and mijia 1080p camera. Also my first hanging homeassistant since two years. The issue happened with ios app.

montesn commented 5 years ago

I had the same issues with the frontend locking up. Didn't resolve until I turned off stream and uninstalled libavformat-dev and libavdevice-dev. Have been running stable since. The logs during the lockup didn't seem to show any issues, the GUI was just unresponsive.

hunterjm commented 5 years ago

Do any of you happen to be running any other components that depend on FFMPEG? Even if they don’t support stream. I.e. Ring

montesn commented 5 years ago

My setup is just 4 Foscam cameras of varying ages and an older Foscam camera I had to set up as mjpeg

sendorm commented 5 years ago

I have 4 cameras working on ffmpeg.

dshokouhi commented 5 years ago

I have Ring setup

bieniu commented 5 years ago

I have only one camera and no other component using ffmpeg.

hunterjm commented 5 years ago

Is the issue consistent (happens frequently)? Is there any indication as to what you do before the error appears?

montesn commented 5 years ago

In my experience (and I only kept it running for a day or so), the issue happened multiple times during the day. I couldn't find any rhyme or reason as to why. In some cases, I was actually using it and it just stopped responding and in others, I'd go back to it after some time and realize I just couldn't get in.

dshokouhi commented 5 years ago

Personally I have only seen this happen once and have yet to reproduce it. I started keeping a preloaded stream again on the camera that was stuck on the timestamp so time will tell if that issue comes back for me.

montesn commented 5 years ago

I would add that if there's a certain level of logging I can turn on to try to gather data for this, I'm happy to restore the configuration that was failing and collect whatever logs are necessary when there's a lockup.

walthowd commented 5 years ago

I have had this same thing happen on several occasions and I can probably reproduce it -- I'm trying to preload a generic camera stream from a balky Hikvision wifi doorbell that occasionally will start lagging and go offline/restart. When that happens it now seems bring down Home Assistant completely.

On each occasion absolutely nothing pertinent has been in the log file. I run default logging level at info, but I can try turning it up. Is there any other components I need to include besides these?

logger:
  default: info
  logs:
    homeassistant.components.stream: debug
    homeassistant.components.camera: debug
    homeassistant.components.generic.camera: debug
hunterjm commented 5 years ago

It sounds like the general theme is cameras with flaky connections that we are currently processing. Likely due to not properly handling dropped connections. That gives me something to work off of and reproduce, but all my cameras are hardwired, and I haven't had any issues... so I will need to force a disconnect 😆

dshokouhi commented 5 years ago

@hunterjm you are spot on with that assessment with the stream dropping. I did a very simple test just now. I loaded up a stream in my dev HA instance and after 1 minute I pulled the power on the camera. I waited 30 seconds and plugged it back in. HA was frozen from that point on and nothing would load, not even the system logs. Here are the logs from this experience.

https://hastebin.com/amuwedazeq.cs

This is what HA looks like when it froze and I tried to look at the logs in the UI

image

tube0013 commented 5 years ago

Same issue here. with preload enabled I get this https://youtu.be/i3HQevp-M3A

and eventually HA hangs hard.

(https://github.com/home-assistant/home-assistant/issues/22941 looks like a likely dupe of this issue)

Skeletorjus commented 5 years ago

In my experience it's enough to have the camera entity visible for a while for HA to hang. It's not isolated to the more-info popup. I've been running with full debug logging, but there's unfortunately nothing of value in the logs

A bit anecdotal, but could be relevant: I run Dafang Hacks and have it configured to send a MQTT image if motion is detected in a given area. If I watch the stream (more-info) and motion is detected the stream will hang. Any subsequent views of the stream will show the few seconds leading up to the motion event like @tube0013 is describing. The log shows WARNING (MainThread) [haffmpeg.tools] Timeout reading image. Could this be the camera saturating the bandwidth with the snapshot leading to the failure of the stream? A restart of HA gets the stream up and running again. This is without preloading.

bieniu commented 5 years ago

I think we have fix #22982

tube0013 commented 5 years ago

I don't think this is fixed. I"m on 0.92.0 and experiencing the same thing if I have a preloaded stream.

det-peralta commented 5 years ago

I don't think this is fixed. I"m on 0.92.0 and experiencing the same thing if I have a preloaded stream.

I still have the same problem on 0.92,0

phcella commented 5 years ago

I don't have any cameras on my hassio.. And I have this error and I'm on 0.92.1. But I have other errors too that I cannot solve.

Capturar

sendorm commented 5 years ago

The problem still persists on 0.92. Today one of my camera's got disconnected (the power cable came lose) and hass became unresponsive afterwards. All I had in logs after that was:

WARNING (MainThread) [homeassistant.components.xxxx] Updating xxxxx took longer than the scheduled update interval 0:00:30

For each and every component on hass. Back to 0.89.2 until this is resolved.

Jpsy commented 5 years ago

With my RTSP stream over WIFI I do not experience any complete HA stalls. But the Lovelace stream view of a Generic IP Camera entity stalls regularly and after that does not recover until I restart HA. My system is Hassio 0.92.2 on a Raspi3B.

Interesting detail: VLC on Mac and on Android does not have any problems with stream outages in the RTSP stream. It just shows some block artifacts and continues to show the video. But tinyCam on Android in fact stalls in a very similar way as the HA Generic IP Cam does. Both VLC and tinyCam are built on FFMPEG. One difference I can see though, is that VLC on Android never tries to write the stream to an MP4 file, which tinyCam can do on request. So maybe there is a difference in the way FFMPEG is integrated that causes these problems when streams are WIFIed and thus unstable.

cveenman commented 5 years ago

Hassio hangs seems like it is due to Streaming an RTSP stream (foscam) resulting in Error demuxing stream: No dts in packet``

Besides this I have endless ConnectionResetErrors ('Cannot write to closing transport') but I have the impression this is not causing too much trouble.

2019-05-06 12:35:59 ERROR (MainThread) [aiohttp.server] Unhandled exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/aiohttp/web_protocol.py", line 447, in start
    await resp.prepare(request)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/web_response.py", line 353, in prepare
    return await self._start(request)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/web_response.py", line 667, in _start
    return await super()._start(request)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/web_response.py", line 410, in _start
    await writer.write_headers(status_line, headers)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/http_writer.py", line 112, in write_headers
    self._write(buf)
  File "/usr/local/lib/python3.7/site-packages/aiohttp/http_writer.py", line 67, in _write
    raise ConnectionResetError('Cannot write to closing transport')
ConnectionResetError: Cannot write to closing transport
2019-05-06 19:50:52 ERROR (stream_worker) [homeassistant.components.stream.worker] Error demuxing stream: No dts in packet
hunterjm commented 5 years ago

I believe this is caused by cameras losing connectivity, which might be solved by #23617. If I can have help testing after beta is cut, that would be great.

Jpsy commented 5 years ago

Yes sure!

montesn commented 5 years ago

I believe this is caused by cameras losing connectivity, which might be solved by #23617. If I can have help testing after beta is cut, that would be great.

I'd be happy to help test as well

Mark612 commented 5 years ago

Problem confirmed. 0.92.2 stream, Amcrest cameras, will hang HA consistently.

pszewello commented 5 years ago

I have yi with hacks v4 and few minutes fter opening the strem in lovelace it will hung hassio. Using the latest version, so problem is still here, and camera does not loose connection, stream works well outside hass.

Bluefries commented 5 years ago

I have yi with hacks v4 and few minutes fter opening the strem in lovelace it will hung hassio. Using the latest version, so problem is still here, and camera does not loose connection, stream works well outside hass.

Same for me

ArnoutVerbeken commented 5 years ago

I see the same error on my Wyzecam (with Dafang Hack firmware). Hass does not hang, but the stream is not stable and hangs. Errors that I have are:

2019-06-07 00:33:53 ERROR (stream_worker) [libav.h264] error while decoding MB 38 19, bytestream -13
2019-06-07 00:34:04 ERROR (stream_worker) [homeassistant.components.stream.worker] Error demuxing stream: No dts in packet

I now use H264 RTSP with camera_view: live in lovelace. The camera itself is just platform: generic HassIO is 0.94.0

Hope this helps. Is there any workaround?

salleq commented 5 years ago

I have a similar issue, but instead of getting any errors in the log, I had the stream stuck playing the same few seconds all over again. In my case HA itself never hung. Rebooting the camera helped get it back on track (instead of rebooting HA), but still I wonder there's no timeout errors for the RTSP stream.

b0c1 commented 5 years ago

In the worker https://github.com/home-assistant/home-assistant/blob/dev/homeassistant/components/stream/worker.py#L73 you raise an error if no dts, instead of the PyAV developer said:

    # We need to skip the "flushing" packets that `demux` generates.
    if packet.dts is None:
        continue

http://docs.mikeboers.com/pyav/develop/cookbook/basics.html?highlight=dts

b0c1 commented 5 years ago

So, your solution is said, if we got flushing pocket, we done. But you not done, but PyAV solution is just "skip" the flushing pocket. Maybe the Camera FFMPEG implementation async read function sent the empty dts? (I just thinking...)

hunterjm commented 5 years ago

So, your solution is said, if we got flushing pocket, we done. But you not done, but PyAV solution is just "skip" the flushing pocket. Maybe the Camera FFMPEG implementation async read function sent the empty dts? (I just thinking...)

The issue is with packet loss for wifi cameras, which can be caused by AP roaming, or an unstable connection. This code does not gracefully handle packet loss, but skipping packets isn't going to fix it. The flushing packet is skipped in the example on pyAV's website because that example is not doing the same thing we are doing. They are remuxing a file from one format to another, and the flushing packet occurs at the end of the stream.

I don't recall exactly what happened, but simply ignoring those packets caused issues with some cameras. If you'd like to modify the logic and debug further, that'd be awesome. I don't have any of the devices people are having trouble with, so it's hard to troubleshoot.

b0c1 commented 5 years ago

The issue is with packet loss for wifi cameras

No, it's a public mp4 link, not a wifi camera https://idokep.hu/radar/radar.mp4 and available webm too. Both sources do the same thing, stream exited with no dts error.

b0c1 commented 5 years ago
- platform: ffmpeg
  input: https://www.idokep.hu/radar/radar.mp4
  extra_arguments: -an

And I add a picture glance to the dashboard

camera_image: camera.ffmpeg
entities: []
type: picture-glance
hunterjm commented 5 years ago

The issue is with packet loss for wifi cameras

No, it's a public mp4 link, not a wifi camera https://idokep.hu/radar/radar.mp4 and available webm too. Both sources do the same thing, stream exited with no dts error.

Well this issue in particular is that this source is not a live feed. The stream component is for live video streams. It has no concept of looping video. When the video ends, the stream will exit with that error because the video is done...

Skeletorjus commented 5 years ago

Still getting this on 0.96.5. Frontend and automations stop. Only thing i get from the logs right before the crash is 2019-08-07 10:39:38 ERROR (stream_worker) [libav.mp3float] Header missing Could this be the culprit?

I don't have the stream-component loaded in my yaml. Is it possibly loaded by default_config, or is this maybe unrelated to the stream-component altogether?

darookee commented 5 years ago

I (still) have the same problem (and same camera) as @ArnoutVerbeken running 0.97.2, I had this for a few versions now. I have a script that starts the recording. If I call the script manually it works but when called from an automation I get the 'dts'-error. Same when not using a script but starting the recording from the automation directly. I tried to test if WiFi connectivity is an issue but as far as I can see the camera has a solid connection.

hawkeye217 commented 5 years ago

Encountering the same problem with my Amcrest camera that I have set up for streaming (without preload). The HA front end freezes hard and HA has to be restarted.

I'm running HA 0.97.2 in a Docker container. I just migrated my install to a faster machine from a RPi3 the other day. Everything ran smoothly on the Pi, though it was running HA 0.95.4.

I noticed some errors in the log before the freeze indicating the camera had gone offline for some reason:

2019-08-21 07:01:40 WARNING (SyncWorker_8) [amcrest.http] <Cam:AMC000G5TR217UWLPG> Trying again due to error: ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host='192.168.1.131', port=80): Read timed out. (read timeout=3.05)"))
2019-08-21 07:01:40 ERROR (MainThread) [homeassistant.components.amcrest.camera] Could not get image from Amcrest Camera camera due to error: CommError
2019-08-21 07:01:47 WARNING (SyncWorker_2) [amcrest.http] <Cam:AMC000G5TR217UWLPG> Trying again due to error: ConnectionError(ProtocolError('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')))
2019-08-21 07:01:48 ERROR (MainThread) [homeassistant.components.amcrest.camera] Could not get image from Amcrest Camera camera due to error: CommError
2019-08-21 07:28:37 WARNING (SyncWorker_5) [amcrest.http] <Cam:AMC000G5TR217UWLPG> Trying again due to error: ReadTimeout(ReadTimeoutError("HTTPConnectionPool(host='192.168.1.131', port=80): Read timed out. (read timeout=3.05)"))
2019-08-21 10:04:31 WARNING (SyncWorker_6) [amcrest.http] <Cam:AMC000G5TR217UWLPG> Trying again due to error: ConnectTimeout(MaxRetryError("HTTPConnectionPool(host='192.168.1.131', port=80): Max retries exceeded with url: /cgi-bin/eventManager.cgi?action=getEventIndexes&code=VideoMotion (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at ADDRESS>, 'Connection to 192.168.1.131 timed out. (connect timeout=3.05)'))"))

I noticed a recent PR (https://github.com/home-assistant/home-assistant/pull/24262) that improved Amcrest error handling. I wonder if that has anything to do with it? Is it only Amcrest cameras that are exhibiting this behavior?

Martinvdm commented 5 years ago

i do also see the error : 2019-08-21 21:06:11 ERROR (stream_worker) [homeassistant.components.stream.worker] Error demuxing stream: No dts in packet

foscam camera ha 0.97.2 no wifi involved, just camera and ha via ethernet

hawkeye217 commented 5 years ago

Turns out I'm seeing the same thing and it isn't confined to only the Amcrest camera. Here's a more verbose message I saw before the frontend hung again after manually triggering an automation that uses the camera.record service:

2019-08-21 18:06:58 ERROR (stream_worker) [libav.NULL] missing picture in access unit with size 35
Exception in thread stream_worker:
Traceback (most recent call last):
  File "/usr/src/app/homeassistant/components/stream/worker.py", line 78, in stream_worker
    raise StopIteration("No dts in packet")
StopIteration: No dts in packet

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/src/app/homeassistant/components/stream/worker.py", line 82, in stream_worker
    hass.loop.call_soon_threadsafe(stream.outputs[fmt].put, None)
KeyError: 'recorder'
b0c1 commented 4 years ago

The issue is with packet loss for wifi cameras

No, it's a public mp4 link, not a wifi camera https://idokep.hu/radar/radar.mp4 and available webm too. Both sources do the same thing, stream exited with no dts error.

Well this issue in particular is that this source is not a live feed. The stream component is for live video streams. It has no concept of looping video. When the video ends, the stream will exit with that error because the video is done...

As you see in the comments it's irrelevant if it a stream or the video. The problem is the DTS packet, not the video (I understand the code will exit at the end but it's must be played once)

b0c1 commented 4 years ago

So, your solution is said, if we got flushing pocket, we done. But you not done, but PyAV solution is just "skip" the flushing pocket. Maybe the Camera FFMPEG implementation async read function sent the empty dts? (I just thinking...)

The issue is with packet loss for wifi cameras, which can be caused by AP roaming, or an unstable connection. This code does not gracefully handle packet loss, but skipping packets isn't going to fix it. The flushing packet is skipped in the example on pyAV's website because that example is not doing the same thing we are doing. They are remuxing a file from one format to another, and the flushing packet occurs at the end of the stream.

I don't recall exactly what happened, but simply ignoring those packets caused issues with some cameras. If you'd like to modify the logic and debug further, that'd be awesome. I don't have any of the devices people are having trouble with, so it's hard to troubleshoot.

So you said, ignoring lost packet is worst than throw an exception and exit the whole processing?

hunterjm commented 4 years ago

So you said, ignoring lost packet is worst than throw an exception and exit the whole processing?

@b0c1 - As I stated before, feel free to debug your situation locally and open a PR with a fix. I would be happy to review it. Some issues you might run in to:

1) next(container.demux(video_stream)) appears to always return an empty Packet object when at the end of the stream. In my testing, if we don't exit, it continues to try and process a feed that no longer exists, and the loop goes on forever which causes resource issues on the host machine.

2) dts stands for "Display Time Stamp". If it does not exist, there is no way to determine when to render the frame. It should always be present on a valid video feed. Maybe add some kind of counter to add a threshold of acceptable frame loss that gets reset once a valid frame is received, and kills the worker when it is exceeded?

b0c1 commented 4 years ago

@hunterjm just an idea but I think you can use pts instead of dts like the pyav frame seek example https://github.com/mikeboers/PyAV/blob/08949352bde5a9b05c95ecc0fd0aad6aa701c25f/scratchpad/frame_seek_example.py#L66

hunterjm commented 4 years ago

@hunterjm just an idea but I think you can use pts instead of dts like the pyav frame seek example https://github.com/mikeboers/PyAV/blob/08949352bde5a9b05c95ecc0fd0aad6aa701c25f/scratchpad/frame_seek_example.py#L66

They are both 0 in the given scenario.