home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
71.16k stars 29.85k forks source link

ERROR: "Client unable to keep up with pending messages. Stayed over 512 for 5 seconds" at the system log #40443

Closed dpositive closed 3 years ago

dpositive commented 3 years ago

The problem

ERROR: "Client unable to keep up with pending messages. Stayed over 512 for 5 seconds" appears at the system log constantly

Environment

subject info
arch armv7
chassis  
dev false
docker true
docker_version 19.03.12
hassio true
host_os Raspbian GNU/Linux 10 (buster)
installation_type Home Assistant Supervised
os_name Linux
os_version 5.4.51-v7+
python_version 3.8.5
supervisor 245
timezone Europe/Kiev
version 0.115.2
virtualenv false

Problem-relevant configuration.yaml

homeassistant:
  name: "aaa"
  latitude: !secret latitude_coord
  longitude: !secret longitude_coord
  elevation: 179
  unit_system: metric
  time_zone: Europe/Kiev
  customize: !include includes/customize.yaml
  external_url: !secret external_url
  internal_url: !secret internal_url
  packages: !include_dir_merge_named includes/packages
  whitelist_external_dirs: 
    - /config  

config:

logger:
  default: info

frontend:
  themes: !include_dir_merge_named themes

map:  

mobile_app:

sun:

system_health:

updater:

http:
  ssl_certificate: /ssl/fullchain.pem
  ssl_key: /ssl/privkey.pem

lovelace:
  mode: yaml
  resources: !include includes/resources.yaml

history:

hacs:
  token: !secret HACS_github
  appdaemon: true
  python_script: true
  theme: true

google_assistant:
  project_id: !secret google_assistant_project_id
  exposed_domains:
   - switch
   - light
   - climate
   - mediaplayer
   - input_boolean
   - scene
   - script
   - sensor

tts:
  - platform: google_translate

mqtt:
  broker: core-mosquitto
  discovery: true
  discovery_prefix: homeassistant  
  username: !secret mqtt_username
  password: !secret mqtt_password

smartir:

timer: !include includes/timer.yaml
input_boolean: !include includes/input_boolean.yaml
input_number: !include includes/input_number.yaml
input_select: !include includes/input_select.yaml
group: !include includes/groups.yaml
scene: !include includes/scenes.yaml
recorder: !include includes/recorder.yaml
climate: !include includes/climate.yaml
zone: !include includes/zone.yaml
person: !include includes/person.yaml
telegram_bot: !include includes/telegram_bot.yaml
notify: !include includes/notify.yaml
remote: !include includes/remote.yaml

sensor: !include_dir_merge_list includes/sensor
automation: !include_dir_merge_list includes/automation
binary_sensor: !include_dir_merge_list includes/bin_sensor
script: !include_dir_merge_named includes/scripts
switch: !include_dir_merge_list includes/switches

Traceback/Error logs

Logger: homeassistant.components.websocket_api.http.connection.1844975792
Source: components/websocket_api/http.py:138
Integration: Home Assistant WebSocket API (documentation, issues)
First occurred: 00:37:19 (1 occurrences)
Last logged: 00:37:19
Client unable to keep up with pending messages. Stayed over 512 for 5 seconds 

Additional information

probot-home-assistant[bot] commented 3 years ago

Hey there @home-assistant/core, mind taking a look at this issue as its been labeled with an integration (websocket_api) you are listed as a codeowner for? Thanks! (message by CodeOwnersMention)

skippy-oz commented 3 years ago

I'm having the same problem.

Client unable to keep up with pending messages. Stayed over 512 for 5 seconds September 24, 2020, 10:07:03 PM – Home Assistant WebSocket API (ERROR) Client unable to keep up with pending messages. Stayed over 512 for 5 seconds September 24, 2020, 9:59:48 PM – Home Assistant WebSocket API (ERROR) Client unable to keep up with pending messages. Stayed over 512 for 5 seconds September 24, 2020, 9:18:01 PM – Home Assistant WebSocket API (ERROR) Client unable to keep up with pending messages. Stayed over 512 for 5 seconds September 24, 2020, 9:12:49 PM – Home Assistant WebSocket API (ERROR) Client unable to keep up with pending messages. Stayed over 512 for 5 seconds September 24, 2020, 9:07:34 PM – Home Assistant WebSocket API (ERROR) Client unable to keep up with pending messages. Stayed over 512 for 5 seconds September 24, 2020, 8:23:00 PM – Home Assistant WebSocket API (ERROR) Client unable to keep up with pending messages. Stayed over 512 for 5 seconds September 24, 2020, 8:17:45 PM – Home Assistant WebSocket API (ERROR) Client unable to keep up with pending messages. Stayed over 512 for 5 seconds September 24, 2020, 6:48:03 PM – Home Assistant WebSocket API (ERROR) Client unable to keep up with pending messages. Stayed over 512 for 5 seconds September 24, 2020, 6:42:54 PM – Home Assistant WebSocket API (ERROR)

Regards Paul

tomlut commented 3 years ago

This has been happening for some time, see:

https://github.com/home-assistant/core/issues/26724

skumka commented 3 years ago

+1

Xitro01 commented 3 years ago

Same issue here!

TheburnerESP commented 3 years ago

Same issue in my logs

andytuinman3 commented 3 years ago

Same issue

ZuyRzuuf commented 3 years ago

I have the same issue for 20 minutes

datascope11 commented 3 years ago

Same issue, seems to have started for me after 0.118

JonGilmore commented 3 years ago

same issue for me as well. not sure what it's related to

st3v3nFr commented 3 years ago

Also having same issue here, not sure when, may be 118.5

rlust commented 3 years ago

Having same issue with 2020.12.2

tschamm commented 3 years ago

Same issue here with 2021.1.0

System Health

version 2021.1.0
installation_type Home Assistant OS
dev false
hassio true
docker true
virtualenv false
python_version 3.8.7
os_name Linux
os_version 5.9.15
arch aarch64
timezone Europe/Berlin
Home Assistant Community Store GitHub API | ok -- | -- Github API Calls Remaining | 5000 Installed Version | 1.9.0 Stage | running Available Repositories | 709 Installed Repositories | 17
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Hass.io host_os | Home Assistant OS 5.9 -- | -- update_channel | stable supervisor_version | 2020.12.7 docker_version | 19.03.13 disk_total | 56.6 GB disk_used | 15.1 GB healthy | true supported | true board | odroid-n2 supervisor_api | ok version_api | ok installed_addons | Check Home Assistant configuration (3.6.0), MariaDB (2.2.1), Duck DNS (1.12.4), Mosquitto broker (5.1), ESPHome (1.15.3), Grafana (5.3.6), SSH & Web Terminal (7.8.0), Visual Studio Code (2.9.1), phpMyAdmin (0.1.4), InfluxDB (3.7.9), Samba share (9.3.0), Spotify Connect (librespot) (dev), Portainer (1.3.0), Spotify Connect (0.8.2), Spotify Connect (dev)
Lovelace dashboards | 5 -- | -- mode | yaml views | 5 resources | 10
Spotify api_endpoint_reachable | ok -- | --
JamesDenby commented 3 years ago

I too have the same issue

chrispe-lab commented 3 years ago

Confirming the same issues here on 2021.1.4.

Edit: I figured out my issue related more to heavy use of zwave devices with “secure node” protocol (mostly for locks and other secure devices). Reprogrammed all of them to normal nodes and the system works much better now.

mikegrayton commented 3 years ago

Same issue here too.

gomble commented 3 years ago

Same issue

swa72 commented 3 years ago

Same here with 2021.1.5

aletzi1 commented 3 years ago

Same issue here. ”Client unable to keep up with pending messages. Stayed over 512 for 5 seconds”

is there anything to solve this?

milwright commented 3 years ago

Same issue here. Has the problem been solved?

Tommaso2020 commented 3 years ago

Same issue for me too

gerds423 commented 3 years ago

Confirming the same issues here on 2021.1.4.

Edit: I figured out my issue related more to heavy use of zwave devices with “secure node” protocol (mostly for locks and other secure devices). Reprogrammed all of them to normal nodes and the system works much better now.

As nobody appears to be able to help. Could this be a trace. I see this from time to time in my logs and have no clue where the problem originates. Therefore - how about collecting configuration similarities among those that run into this issue. I do use Z-Wave as well, without the Z-Wave to MQTT AddOn. I have now tried to lower the Z-Wave network traffic by reducing update frequencies to a reasonable and necessary amount. I did already migrate to maria dB and that appeared to help as well. Any other suggestions? I do still encrypt Z-Wave and would like to keep it that way.

bash-worth commented 3 years ago

Same here on 2021.2.0 running on a rpi4. Not using z-wave at all.

bdraco commented 3 years ago

Please use the profiler to generate a callgrind.out.xxx file, zip it up, and post it here.

https://www.home-assistant.io/integrations/profiler/

For privacy concerns: The file will contain all the references to the python code that has been called. It will reveal which integrations you have installed, but should not reveal any personally identifiable information (unless you have somehow added written and added custom code that has your personal information it in the function names or filenames).

bash-worth commented 3 years ago

callgrind.out.1612509384619935.zip

gerds423 commented 3 years ago

profile.callgrind.out1612511463509750.zip

Tommaso2020 commented 3 years ago

callgrind.out.1612516765825665.zip

mikegrayton commented 3 years ago

Thanks for the hint, here’s my output. callgrind.out.zip

bdraco commented 3 years ago

I've gone through the profiles, and they all look like they are not overloaded at the time of the problem.

We need profile that was captured when the websocket is overloaded.

gerds423 commented 3 years ago

First of all - thank you soooo much bdraco. Yes, that is a problem for me. I see the error about 5 times a day. So catching that moment and getting a profile at that time is basically not possible. Even going for a verbose logging level will be tricky for a whole day. Any other ideas? Can we trigger profiles to run when trouble is logged?

bdraco commented 3 years ago

Maybe create an automation to watch the system load https://www.home-assistant.io/integrations/systemmonitor/ and call the profiler.start service when it goes above the level.

isabellaalstrom commented 3 years ago

I've had this since forever, and recently the problem disappeared. The thing I changed is that I temporarily removed two rpis with screens in my home, that always show a couple of lovelace dashboards. I will keep an eye out for them to reappear when I start to use the screens again.

Edit: I have a lot of templating in the views that are shown on these screens. Someone said that might be an issue.

bdraco commented 3 years ago

Are the Home Assistants all running on RPi or other commodity hardware?

I'm wondering if the JSON serializer performance is the constraint here.

bdraco commented 3 years ago

I don't think we can do anything to improve the JSON serialize performance as it's already well optimized.

Maybe we could serialize before we put it in the queue. That ways it's only I/O bound instead of cpu bound

bdraco commented 3 years ago

Looks like we already made that change so that's not going to help.

Screen Shot 2021-02-10 at 4 15 42 PM
gerds423 commented 3 years ago

Are the Home Assistants all running on RPi or other commodity hardware?

I'm wondering if the JSON serializer performance is the constraint here.

Mine is a RPi 4, booting from SD and all HA Code and Data on an external USB3 SSD.

isabellaalstrom commented 3 years ago

I have an old NUC.

gerds423 commented 3 years ago

I've had this since forever, and recently the problem disappeared. The thing I changed is that I temporarily removed two rpis with screens in my home, that always show a couple of lovelace dashboards. I will keep an eye out for them to reappear when I start to use the screens again.

Edit: I have a lot of templating in the views that are shown on these screens. Someone said that might be an issue.

What is templating ;-) No, sorry - I do know that but my interface is very basic - but I also notice that using more clients to connect seems to be related. I usually don't interact with the web interface of HA - as I handle almost everything by voice command and automation. Only when I have new devices I work a lot with the Web UI to integrate them - and that is where I noticed the errors. Lately there have not been any "keep up with pending messages" errors. I will keep an eye on it and post my profile under heavy load, as soon as I can get a grip on it.

dingausmwald commented 3 years ago

Same

narsaw commented 3 years ago

following .. same issue

rolamento commented 3 years ago

Same Issue

Xitro01 commented 3 years ago

Are the Home Assistants all running on RPi or other commodity hardware?

I'm wondering if the JSON serializer performance is the constraint here.

No, got this issue on a newer i5 NUC.

bdraco commented 3 years ago

If you look at the developer console, and find the websocket connection in network (you may need to shift+reload), do you see any errors?

Xitro01 commented 3 years ago

If you look at the developer console, and find the websocket connection in network (you may need to shift+reload), do you see any errors?

In the network tab (after refreshing) I see "?homescreen=1" fetching failed. Also sometimes get a ERROR 500 because it seems to fail to get a new image from my media_player sometimes: Request URL: https://x/api/media_player_proxy/media_player.nvidia_shield?token=x&cache=x Request Method: GET Status Code: 500 Internal Server Error Remote Address: x:443 Referrer Policy: same-origin

kmalinich commented 3 years ago

@Xitro01

Same issue for me as well. I can replicate this issue when typing a few characters into the entity_id input field in Developer Tools > States.

I'm running HA 2021.4.4, on a Ryzen 7 3700X, bare-metal (virtualenv install) with access to a very healthy mariadb server. Client is Chrome 89.0.4389.128, macOS 11.2.3, on an i7-9700K @ 5.2GHz. I don't immediately suspect performance is the issue.

That being said, I do have a lot of entities... currently 1,209, to be specific.

bdraco commented 3 years ago

We have merged a change that will be in next months release that will reduce the pressure on the server side. If this still happens after installing the May release the issue is likely due to the performance on the client (browser) side and there isn't much we can do about it.

hellcry37 commented 3 years ago

Client unable to keep up with pending messages. Stayed over 512 for 5 seconds still going core-2021.6.6

julianrinaldi commented 3 years ago

Have this as well on 2021.7 (and previous versions)

picotrain77 commented 3 years ago

Same here for me seems to happen every day without fail

dzianiwonsacze commented 3 years ago

Same for me. I'm not sure if this is related, but all time sensitive automations have slowed down considerably recently, to the point where motion sensors or wireless switches are nearly useless - the delay is sometimes even 10+ seconds.