Closed rgerbranda closed 11 months ago
nest documentation nest source (message by IssueLinks)
Hey there @allenporter, mind taking a look at this issue as it has been labeled with an integration (nest
) you are listed as a code owner for? Thanks!
(message by CodeOwnersMention)
Hello, thank you for the report, and thank for you including the diagnostics, it is extremely helpful.
Quick question, how long was you home assistant instance active when the diagnostics data was downloaded? I'm trying to determine if this was a few seconds with of data/counters or many minutes to hours? (The activity is low, and does not explain the CPU usage)
Also one other question, if you open the camera stream view in the frontend user interface is the Preload stream
box checked?
Hi @allenporter
The Home assistant instance was active for many minutes. Today I created a new instance with only the Nest integration. For this instance I see the same. CPU usage of about 65 to 70%
The Preload stream
box is not checked.
Attached I have a zipped py-spy report of the instance that has only the Nest integration.
OK. what do you make of that report? It looks fairly empty to me:
So _worker
is some generic unnamed thread, and we don't see any actual work happening, maybe since it is a future.
From what I can see there is one call to GenerateRtspStream
which makes me think that the live stream is loaded and its processing the stream, or at least it loaded once for serving an image thumbnail.
configuration.yaml
?
logger:
default: info
logs:
homeassistant.components.nest: debug
homeassistant.components.nest.climate_sdm: debug
homeassistant.components.nest.camera_sdm: debug
homeassistant.components.nest.sensor_sdm: debug
homeassistant.helpers.config_entry_flow: debug
homeassistant.helpers.config_entry_oauth2_flow: debug
google_nest_sdm: debug
google_nest_sdm.device: debug
google_nest_sdm.device_manager: debug
google_nest_sdm.google_nest_subscriber: debug
google_nest_sdm.event: debug
Hi @allenporter -- I just wanted to add that I am seeing the same behavior described by @rgerbranda. CPU loads are sitting at 70-80% with the Nest integration enabled, but return to normal (<10%) if I disable it. I also have a doorbell.
I don't see anything obvious in the logs.
Thank you. Can you confirm: Anything with debug logs turned on? Anything about streams at all? Can you also share you diagnostics information?
I'm also seeing something similar. For me when the Nest integration is enabled, CPU usage is ~20% and with it disabled CPU usage is ~4%. My Nest account only has a thermostat though. I don't have a doorbell.
Thank you @rgerbranda -- I'm not seeing anything out of the ordinary here, however, the camera does appear reasonably busy with events (e.g. the camera has received >1k events in the last few days). I'm assuming you're using the home assistant frontend during this time since there are things like loading the media player or some other type of read of the events from disk, but again nothing beyond what I would expect.
Additional questions:
@ibddot and @halpatraz thank you for confirming you also see the issue i need additional information in the form of a much more detailed report including home assistant version, the architecture, any more details about what is being used, diagnostics information and logs if possible, as well as answering all of the questions about what is actually being used. I appreciate your time on this as i'm not able to reproduce this.
Here is some more information on my system. I haven't found any errors or references in logs to Nest integration. Let me know what logs you are looking for and I'll try to get them. I had to use trial and error disabling things to find the source of the high CPU usage being the nest integration.
version | core-2022.2.9 |
---|---|
installation_type | Home Assistant OS |
dev | false |
hassio | true |
docker | true |
user | root |
virtualenv | false |
python_version | 3.9.7 |
os_name | Linux |
os_version | 5.10.92-v7 |
arch | armv7l |
timezone | America/Los_Angeles |
Is everyone seeing this using a raspberry pi? I'm wondering if there is something arm specific in grpc or cloud pubsub going off the rails, give those were upgraded in the last release and otherwise nest doesn't appear to be doing much. I can set up a dev environment to try to reproduce.
Yes, RPi 3B here. Using the Container, latest stable release.
Also Raspberry PI here.
To compare I installed Home Assistant on an Intel device. CPU is almost 0 %
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
192 189 root S 396m 2% 11 0% python3 -m homeassistant --config /config
I was able to reproduce this on armv7.
W/o nest:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16573 root 20 0 225024 132616 28392 S 11.3 6.9 0:19.18 python3
nest enabled:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17095 root 20 0 245936 138836 30724 S 61.5 7.2 0:49.52 python3
I had to do some restarts in between. Thanks folks, that is helpful, now I think I have enough to investigate further.
When I had a closer look and broke down by threads, they are just idle threads.
And really there is minimal difference between nest and not nest except for another idle subscriber thread or maybe two:
So i think the CPU usage may not be happening in python at all. My only thought is that something different is happening in cython with the upgrade of grpc. Will look at that next.
I ran https://www.home-assistant.io/integrations/profiler/ which does not yet seem to produce anything super interesting. 90% of the cpu usage is in select/poll. profile.zip
I think the next steps here are to investigate and profile the cython side of things, since the python side just has idle threads.
I played with heartrate a bit, but it gave not much insight
I saw some fixes in grpc 1.45 about default polling strategy, so as soon as that is released on pypi we can give that a try and see if it helps
As of home assistant 2022.5.4 I see much lower cpu usage:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5457 root 20 0 234352 126264 30252 S 27.8 6.6 0:26.76 python3
Curious if folks are also seeing an improvement in 2022.5.X releases that includes grpc 1.45.0?
There are also some polling strategy changes in grpc 1.45.1 which I am curious to see the impact of. I've sent a PR to update to 1.46.1 to include those changes in the next home assistant release.
I'm also running 2022.5.4 now, the CPU usage of python3 is around 40% with only the Nest integration enabled.
.
I'm still seeing high CPU usage of around 20% when it is usually 1% without the Nest integration. Maybe it's a bit better on 2022.5.4 for me as it seemed to be around 30% CPU use previously.
Curious to see now how this looks for you now in Home Assistant 2022.6
which includes grpcio==1.46.1
?
I'll do some additional testing but wanted to get your take.
Still at around 20-25% CPU usage...
Same here, around 25%
I did a bunch of further profiling and anecdotally looking at threads and what I see I believe is related to underlying details of grpc polling.
When i have an example python3 program that starts a python pubsub subscriber it starts 12 threads that are busy
$ GRPC_VERBOSITY=debug python3 ./example.py --project_id=XXX --subscription_id=YYY
I0701 07:36:43.961390731 2330 ev_epoll1_linux.cc:121] grpc epoll fd: 3
D0701 07:36:43.961481969 2330 ev_posix.cc:171] Using polling engine: epoll1
...
Then cpu usage goes up and it seems like there are a lot of busy threads doing stuff, e.g. they should be just sitting waiting for epoll but are very busy.
$ sudo strace -fp 32379
....
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY) = -1 ETIMEDOUT (Connection timed out)
[pid 32393] syscall_0x193(0x1, 0xb3e739c0, 0x1, 0xb3e739c0, 0x193, 0xb3e73b40) = 0
[pid 32393] syscall_0x193(0, 0xb3e739b0, 0, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739b0, 0x1, 0xb3e739b0, 0x193, 0) = 0
[pid 32393] syscall_0x193(0x1, 0xb3e739f0, 0x1, 0xb3e739f0, 0x193, 0xb644f424) = 0
[pid 32393] futex(0xb46f7c30, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {tv_sec=1656686216, tv_nsec=0}, FUTEX_BITSET_MATCH_ANY
When in run the example with grpc debug turned on i'm seeing a call to grpc_completion_queue_next
every few milliseconds?
$ GRPC_VERBOSITY=debug GRPC_TRACE=api python3 ./example.py --project_id=woodgreen-local-developer --subscription_id=home-assistant-Q8P60ivNwM
...
I0701 07:47:57.186653970 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 186645877, clock_type: 1 }, reserved=0)
I0701 07:47:57.187771076 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 187762947, clock_type: 1 }, reserved=0)
I0701 07:47:57.188889998 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 188882331, clock_type: 1 }, reserved=0)
I0701 07:47:57.190008679 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 190000271, clock_type: 1 }, reserved=0)
I0701 07:47:57.191126822 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 191118508, clock_type: 1 }, reserved=0)
I0701 07:47:57.192245762 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 192237596, clock_type: 1 }, reserved=0)
I0701 07:47:57.193362258 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 193354165, clock_type: 1 }, reserved=0)
I0701 07:47:57.194478827 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 194471068, clock_type: 1 }, reserved=0)
I0701 07:47:57.195596101 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 195587804, clock_type: 1 }, reserved=0)
I0701 07:47:57.196711522 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 196703485, clock_type: 1 }, reserved=0)
I0701 07:47:57.197829092 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 197820814, clock_type: 1 }, reserved=0)
I0701 07:47:57.198944161 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 198936698, clock_type: 1 }, reserved=0)
I0701 07:47:57.200061953 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 200053138, clock_type: 1 }, reserved=0)
I0701 07:47:57.201180023 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 201172134, clock_type: 1 }, reserved=0)
I0701 07:47:57.202298833 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 202290611, clock_type: 1 }, reserved=0)
I0701 07:47:57.203415569 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 203407532, clock_type: 1 }, reserved=0)
I0701 07:47:57.204532194 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 204523750, clock_type: 1 }, reserved=0)
I0701 07:47:57.205648449 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 205640709, clock_type: 1 }, reserved=0)
I0701 07:47:57.206764926 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 206756630, clock_type: 1 }, reserved=0)
I0701 07:47:57.207881996 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 207874033, clock_type: 1 }, reserved=0)
I0701 07:47:57.208998732 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 208990769, clock_type: 1 }, reserved=0)
I0701 07:47:57.210118191 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 210109468, clock_type: 1 }, reserved=0)
I0701 07:47:57.211237149 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 211228353, clock_type: 1 }, reserved=0)
I0701 07:47:57.212356441 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 212348348, clock_type: 1 }, reserved=0)
I0701 07:47:57.213474251 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 213465899, clock_type: 1 }, reserved=0)
I0701 07:47:57.214591043 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 214582950, clock_type: 1 }, reserved=0)
I0701 07:47:57.215708446 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 215700076, clock_type: 1 }, reserved=0)
I0701 07:47:57.216826997 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 216819330, clock_type: 1 }, reserved=0)
I0701 07:47:57.217945974 2482 completion_queue.cc:961] grpc_completion_queue_next(cq=0xb48a2bf0, deadline=gpr_timespec { tv_sec: 1656686877, tv_nsec: 217936974, clock_type: 1 }, reserved=0)
When I run the sample example on an intel machine, i only see this every tens/hundreds of milliseconds:
I0701 08:07:52.176668681 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688072, tv_nsec: 376658335, clock_type: 1 }, reserved=(nil))
I0701 08:07:52.378082235 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688072, tv_nsec: 578073562, clock_type: 1 }, reserved=(nil))
I0701 08:07:52.579491408 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688072, tv_nsec: 779483197, clock_type: 1 }, reserved=(nil))
I0701 08:07:52.780896218 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688072, tv_nsec: 980887103, clock_type: 1 }, reserved=(nil))
I0701 08:07:52.982330129 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688073, tv_nsec: 182321812, clock_type: 1 }, reserved=(nil))
I0701 08:07:53.183738957 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688073, tv_nsec: 383732918, clock_type: 1 }, reserved=(nil))
I0701 08:07:53.385128496 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688073, tv_nsec: 585119089, clock_type: 1 }, reserved=(nil))
I0701 08:07:53.585531261 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688073, tv_nsec: 785522039, clock_type: 1 }, reserved=(nil))
I0701 08:07:53.786987125 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688073, tv_nsec: 986978495, clock_type: 1 }, reserved=(nil))
I0701 08:07:53.988393263 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688074, tv_nsec: 188383389, clock_type: 1 }, reserved=(nil))
I0701 08:07:54.189824787 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688074, tv_nsec: 389816478, clock_type: 1 }, reserved=(nil))
I0701 08:07:54.391226093 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688074, tv_nsec: 591217373, clock_type: 1 }, reserved=(nil))
I0701 08:07:54.591705781 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688074, tv_nsec: 791696467, clock_type: 1 }, reserved=(nil))
I0701 08:07:54.793094829 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688074, tv_nsec: 993086806, clock_type: 1 }, reserved=(nil))
I0701 08:07:54.994482102 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688075, tv_nsec: 194472856, clock_type: 1 }, reserved=(nil))
I0701 08:07:55.195892478 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688075, tv_nsec: 395885849, clock_type: 1 }, reserved=(nil))
I0701 08:07:55.397302794 266838 completion_queue.cc:962] grpc_completion_queue_next(cq=0x2d063d0, deadline=gpr_timespec { tv_sec: 1656688075, tv_nsec: 597293498, clock_type: 1 }, reserved=(nil))
^CI0701 08:07:55.545513133 266830 call.cc:648] grpc_call_cancel_with_status(c=0x7f73b4001ab0, status=1, description=Channel closed!, reserved=(nil))
...
So i think this indicates to me this is either grpc or streaming pubsub armv7 issue
I have an example repro case based on the python pubsub doc example:
"""Example reproducing grpc cpu issues."""
import argparse
import logging
from concurrent.futures import TimeoutError
from google.cloud import pubsub_v1
parser = argparse.ArgumentParser(
description="Command line tool for Google Nest SDM API"
)
parser.add_argument("--project_id", required=True, help="Project id")
parser.add_argument("--subscription_id", required=True, help="Subscription id")
parser.add_argument("--timeout", default=90, help="Duration to listen for messages")
parser.add_argument( '-log',
'--loglevel',
default='warning',
help='Provide logging level. Example --loglevel debug, default=warning' )
TIMEOUT = 90
def main() -> None:
"""Nest command line tool."""
args: argparse.Namespace = parser.parse_args()
logging.basicConfig( level=args.loglevel.upper() )
subscriber = pubsub_v1.SubscriberClient()
# The `subscription_path` method creates a fully qualified identifier
# in the form `projects/{args.project_id}/subscriptions/{args.subscription_id}`
subscription_path = subscriber.subscription_path(args.project_id, args.subscription_id)
def callback(message: pubsub_v1.subscriber.message.Message) -> None:
print(f"Received {message}.")
message.ack()
streaming_pull_future = subscriber.subscribe(subscription_path, callback=callback)
print(f"Listening for messages on {subscription_path}..\n")
# Wrap subscriber in a 'with' block to automatically call close() when done.
with subscriber:
try:
# When `timeout` is not set, result() will block indefinitely,
# unless an exception is encountered first.
streaming_pull_future.result(timeout=args.timeout)
except TimeoutError:
streaming_pull_future.cancel() # Trigger the shutdown.
streaming_pull_future.result() # Block until the shutdown is complete.
Run with:
$ export GOOGLE_APPLICATION_CREDENTIALS=/tmp/service-account.json
$ GRPC_VERBOSITY=debug GRPC_TRACE=api python3 /tmp/example.py --project_id=XXXXX--subscription_id=YYYY --loglevel=debug
I've filed https://github.com/googleapis/python-pubsub/issues/728 upstream with pubsub though i suspect this is really a grpc issue.
Thanks for the good work.
Any updates on this ? My pi cpu is still running at 70g celcius all day doing nothing
See my last update above for the latest status
They ignore you I see.
Anyway, when restarting HA the cpu usage is lower than 20%, after a while it's generating more cpu, still very annoying and doesn't look like the problem's gonna be resolved anytime soon
Can you bump some more ? My CPU is running 75,8c doing nothing 24/7 (!) :(
Adding to this thread just as an additional data point, although the effects I notice are far less obvious.
I am using HAOS as a VM on an Intel machine, through KVM. So no ARM involved. I notice that the CPU usage of the VM as reported by the host is consistently between 5-10% when the Nest integration is enabled, while it sits around 1% with it disabled.
I tracked this down simply because the power usage of the server is increased somewhat due to a significantly higher number of wake cycles on the VM when Nest is enabled, so although the increased CPU usage sounds minimal it has a meaningful impact above and beyond the dozen or so other integrations I have enabled.
My profile data is similarly empty like that posted earlier, consisting mainly of epoll events and no clear difference between the Nest integration being enabled/disabled.
Hi, I am having a very similar issue (high CPU usage when the nest integration is on). I'm running the homeassistant docker image on a 32-bit rPi4. I've noticed we both have the nest doorbell - perhaps this is causing the issue? I've attached my diagnostics: EDIT: as an aside, my doorbell (GWX3T) is on battery and seems to be "streaming" an empty, black RTC video stream, which seems to affect the battery life.
{
"home_assistant": {
"installation_type": "Home Assistant Container",
"version": "2022.8.7",
"dev": false,
"hassio": false,
"virtualenv": false,
"python_version": "3.10.5",
"docker": true,
"arch": "armv7l",
"timezone": "Europe/London",
"os_name": "Linux",
"os_version": "5.10.103-v7l+",
"run_as_root": true
},
"custom_components": {},
"integration_manifest": {
"domain": "nest",
"name": "Nest",
"config_flow": true,
"dependencies": [
"ffmpeg",
"http",
"application_credentials",
"repairs"
],
"after_dependencies": [
"media_source"
],
"documentation": "https://www.home-assistant.io/integrations/nest",
"requirements": [
"python-nest==4.2.0",
"google-nest-sdm==2.0.0"
],
"codeowners": [
"@allenporter"
],
"quality_scale": "platinum",
"dhcp": [
{
"macaddress": "18B430*"
},
{
"macaddress": "641666*"
},
{
"macaddress": "D8EB46*"
}
],
"iot_class": "cloud_push",
"loggers": [
"google_nest_sdm",
"nest"
],
"is_built_in": true
},
"data": {
"subscriber": {
"start": 1
},
"devices": [
{
"data": {
"name": "**REDACTED**",
"type": "sdm.devices.types.DOORBELL",
"assignee": "**REDACTED**",
"traits": {
"sdm.devices.traits.Info": {
"customName": "**REDACTED**"
},
"sdm.devices.traits.CameraLiveStream": {
"videoCodecs": [
"H264"
],
"audioCodecs": [
"OPUS"
],
"supportedProtocols": [
"WEB_RTC"
]
},
"sdm.devices.traits.CameraImage": {
"maxImageResolution": {
"width": 1920,
"height": 1200
}
},
"sdm.devices.traits.CameraPerson": {},
"sdm.devices.traits.CameraMotion": {},
"sdm.devices.traits.DoorbellChime": {},
"sdm.devices.traits.CameraClipPreview": {}
},
"parentRelations": [
{
"parent": "**REDACTED**",
"displayName": "**REDACTED**"
}
]
}
}
],
"camera": {
"camera.doorbell": {}
}
}
}
I don't have a doorbell, but do have a thermostat and cam iq outdoor
Hi folks, we have enough information here as mentioned above. Underlying issue is: https://github.com/googleapis/python-pubsub/issues/728
Is this ever going to be fixed ?
"To be frank, I haven't really looked into this issue in much detail" doesn't sound very hopefull
Thnx
I am also experiencing this as I have 4 Nest cameras and my Pi runs with a high cpu python3 process (80%-100%). It's unfortunate to see little to no interest in trying to fix this from Google.
+1 to keep alive I have a few hubs, cameres, 2 thermostats and doorbell and my load average is in the 50s on an rpi 4.
Previously was running on a 3b+ and things ran fine. So not sure why the hardware change made a difference but only started happening after then
It seems like a low level bug related to timers specific to the instruction set. Likely not the biggest use case for the python library, and likely requires expertise, interest, and ownership of one of these devices to be able to reproduce. It's an open source project after all.
@frenck help?
@tcc0 I don't think that's the right answer
The right answer is I disabled the addon for now, running 100% cpu for months on a row is unacceptable, hoped a little more effort from the devs
Hey! Remember this is a volunteer project. I bunch a pretty significant amount of effort into diagnosing and profiling this, finding a potential cause, and filed the issue in the upstream project.
The problem
When I enable the Google Nest integration, I see a continous CPU load of python3 of about 65% Without this integration CPU load of python3 is just about 2%
Any recommendation to optimize the load of the Google Nest integration?
What version of Home Assistant Core has the issue?
2022.2.9
What was the last working version of Home Assistant Core?
No response
What type of installation are you running?
Home Assistant Container
Integration causing the issue
nest
Link to integration documentation on our website
https://www.home-assistant.io/integrations/nest
Diagnostics information
Example YAML snippet
No response
Anything in the logs that might be useful for us?
No response
Additional information
No response