home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
73.19k stars 30.57k forks source link

Logbook has become very slow at loading - like several minutes #51446

Closed KennethLavrsen closed 3 years ago

KennethLavrsen commented 3 years ago

Checklist

Describe the issue you are experiencing

I have a pretty mature and large installation but nothing out of the ordinary when I compare to the numbers I hear from other users

I am sitting in my living room. I am home alone. I have some motion sensors in the garden that are a little active because of some bushes waving. But nothing dramatic.

When I click logbook I have to wait 2 minutes and 40 seconds for the page to load!!!!

Before 2021.6 I had to wait maybe 5 or 10 seconds. Something has been changed that takes forever.

It is the same no matter what browser I look at and under any OS. I have tried Windows, IOS, and Linux. Same kind of waiting. It takes between 2 and 3 minutes to show the default logbook view. You cannot even stop it and enter a smaller time range. You have to sit and wait for minutes before you can do anything.

  1. Something has gone completely mad with the 2021.6 update
  2. It is not wise to not have a limit to how many items that get loaded. It should stop at a given number.
  3. You should be able to put a new interval or item to filter for right away without waiting.

I have also tried on a new HA I made just for testing that has nothing but the default few entities. Then it loads right away. It is related to the amount of data being collected and sent to the browser.

Describe the behavior you expected

I expect to see the page loaded in less than 10 seconds

Steps to reproduce the issue

  1. Use a typical larger installation and not a test installation with few items
  2. Look at Logbook
  3. Start your stop watch. Or may a paper calender.

What version of Home Assistant Core has the issue?

2021.6.0

What was the last working version of Home Assistant Core?

2021.5.5

In which browser are you experiencing the issue with?

All common browsers

Which operating system are you using to run this browser?

Linux, Windows, IOS

State of relevant entities

No response

Problem-relevant frontend configuration

All default themes

Javascript errors shown in your browser console/inspector

No JS errors

Additional information

No response

bramkragten commented 3 years ago

Is the frontend or the backend that is taking a long time? I would guess it is the backend? Check the network tab in your browser dev tools.

KennethLavrsen commented 3 years ago

Hmm. I am not too experienced with web dev tools.

it seems the 3 minutes of wait is doing this fetch on the network.

http://192.168.1.11:8123/api/logbook/2021-06-03T08:00:00.000Z?end_time=2021-06-03T11%3A00%3A00.000Z

It sends 2220 items and that takes 3 minutes. That is slow

KennethLavrsen commented 3 years ago

If this is indeed backend then I guess this needs to be moved to core which I cannot do

KennethLavrsen commented 3 years ago

Update

I have restarted Home Assistant many times after upgrade.

But now I got the idea to restart the Home Assistant Operating system and bring the entire HA Blue down and up again.

The loading time for Logbook is now around 30 seconds instead of 3 minutes. Still awful but better.

In the Supervisor log I got these interesting warnings

21-06-03 17:47:28 ERROR (MainThread) [supervisor.api.middleware_security] Invalid token for access /network/interface/default/info
21-06-03 17:47:32 INFO (MainThread) [supervisor.core] Start Home Assistant Core
21-06-03 17:47:32 INFO (SyncWorker_9) [supervisor.docker.interface] Starting homeassistant
21-06-03 17:47:32 INFO (MainThread) [supervisor.homeassistant.core] Wait until Home Assistant is ready
21-06-03 17:47:40 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state CoreState.STARTUP
21-06-03 17:47:40 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
21-06-03 17:47:41 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
21-06-03 17:48:29 INFO (MainThread) [supervisor.homeassistant.core] Detect a running Home Assistant instance
21-06-03 17:48:29 INFO (MainThread) [supervisor.addons] Phase 'AddonStartup.APPLICATION' starting 3 add-ons
21-06-03 17:48:30 INFO (SyncWorker_4) [supervisor.docker.addon] Starting Docker add-on homeassistant/aarch64-addon-configurator with version 5.3.1
21-06-03 17:48:30 WARNING (MainThread) [supervisor.addons.options] Option 'database_path' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:30 WARNING (MainThread) [supervisor.addons.options] Option 'read_only' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:30 WARNING (MainThread) [supervisor.addons.options] Option 'datasette' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:30 WARNING (MainThread) [supervisor.addons.options] Option 'ssl' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:30 WARNING (MainThread) [supervisor.addons.options] Option 'certfile' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:30 WARNING (MainThread) [supervisor.addons.options] Option 'keyfile' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:30 WARNING (SyncWorker_7) [supervisor.addons.options] Option 'database_path' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:30 WARNING (SyncWorker_7) [supervisor.addons.options] Option 'read_only' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:30 WARNING (SyncWorker_7) [supervisor.addons.options] Option 'datasette' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:30 WARNING (SyncWorker_7) [supervisor.addons.options] Option 'ssl' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:30 WARNING (SyncWorker_7) [supervisor.addons.options] Option 'certfile' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:30 WARNING (SyncWorker_7) [supervisor.addons.options] Option 'keyfile' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:31 INFO (SyncWorker_7) [supervisor.docker.addon] Starting Docker add-on ghcr.io/hassio-addons/sqlite-web/aarch64 with version 3.1.0
21-06-03 17:48:31 INFO (SyncWorker_3) [supervisor.docker.addon] Starting Docker add-on thomasmauerer/samba-backup-aarch64 with version 4.5.0
21-06-03 17:48:32 WARNING (MainThread) [supervisor.addons.options] Option 'database_path' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:32 WARNING (MainThread) [supervisor.addons.options] Option 'read_only' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:32 WARNING (MainThread) [supervisor.addons.options] Option 'datasette' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:32 WARNING (MainThread) [supervisor.addons.options] Option 'ssl' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:32 WARNING (MainThread) [supervisor.addons.options] Option 'certfile' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)
21-06-03 17:48:32 WARNING (MainThread) [supervisor.addons.options] Option 'keyfile' does not exist in the schema for SQLite Web (a0d7b954_sqlite-web)

Ie. database problems!

Could this be related.

Next for me is to delete the database and let it build again

There is something rotten here

KennethLavrsen commented 3 years ago

Errors are gone after database update. I will need to wait 3 hours for the database to fill up. But I can say that already after a boot it takes 5-10 full seconds to load only around 70 records. That is slow. I will report in 3 hours what it takes then

KennethLavrsen commented 3 years ago

Back with result. I have 3 hours of logbook data which is the default initial interval

It takes 1.4 minutes.

It is not the loading of the data. It is the waiting time for first byte that takes 1.4 minutes. HA takes 1.4 minutes to extract 1170 records from the database and send it to the browser. The transfer of the actual data is 6.53 ms.

There is something broken in this release related to extracting data from database. I do not understand why others have not reported it yet. I have no other addons that can load the system. It is a fully up to date HA Blue. No hacks.

1.4 minutes is not useful in practical.

This is with freshly built database and no more database related errors in log.

KennethLavrsen commented 3 years ago

It is not only logbook that is slow. It also takes forever when you click on an entity. The little bit of history of a motion sensor with a fresh database only 3 hours old takes 5-10 seconds to load.

bramkragten commented 3 years ago

What is your recorder config?

KennethLavrsen commented 3 years ago

Some more data (I will answer the last question from Bram when I have written this).

I have some datapoints from my experiments. So it is all about how long it takes from the front end asks for data till first byte is sent.

With 2021.6.0 the average time per record is 71.8 ms. With 2021.5.5 the average time per record is 0.30 ms. That is a factor 239 times. Something is rotten in the database code

I found one more fellow "victim" that see same problem as I but running on an i5. It seems people running 2021.6.0 with MySQL or other non SQLite database have more moderate update times even with large amounts of data. So it smells like SQLite related issue that has been introduced with 2021.6.0.

I reverted my installation back to 2021.5.5 to confirm and get the bench mark data.

KennethLavrsen commented 3 years ago

Thanks for transfering to Core.

My recorder setting is

recorder:
  purge_keep_days: 2
  exclude:
    entities:
      - sensor.vanlose_9a
      - sensor.glostrup_busses
      - sensor.glostrup_s_train
      - sensor.byparken
      - sensor.home_assistant_uptime
      - .... a longer list of sensors I do not want in logs for security reasons
      - binary_sensor.amcrest_1_motion_detected
      - sensor.blitzwolf_voltage
      - sensor.blitzwolf_power_factor
    event_types:
      - service_removed
      - platform_discovered      
      - feedreader
      - service_registered
      - component_loaded
      - logbook_entry
      - system_log_event
      - timer_out_of_sync
      - !secret alexa_media_entity

It is same I also run in 2021.5.5

It keeps the SQlite DB at around 170 MB.

But as I said. Even with fresh DB and only 8 MB of data, the load time for 3 hours is 1.4 minutes. And it is 1-2 seconds with 2021.5.5. It is so fast in 2021.5.5 and earlier that I never felt it worth looking at real SQL DB since I only keep data for 2 days and exclude a lot of the noisy things.

probot-home-assistant[bot] commented 3 years ago

logbook documentation logbook source (message by IssueLinks)

bdraco commented 3 years ago
diff --git a/homeassistant/components/logbook/__init__.py b/homeassistant/components/logbook/__init__.py
index de0f901be3..87b66c5773 100644
--- a/homeassistant/components/logbook/__init__.py
+++ b/homeassistant/components/logbook/__init__.py
@@ -500,10 +500,10 @@ def _generate_events_query(session):
 def _generate_events_query_without_states(session):
     return session.query(
         *EVENT_COLUMNS,
-        literal(None).label("state"),
-        literal(None).label("entity_id"),
-        literal(None).label("domain"),
-        literal(None).label("attributes"),
+        literal(value=None, type_=sqlalchemy.String).label("state"),
+        literal(value=None, type_=sqlalchemy.String).label("entity_id"),
+        literal(value=None, type_=sqlalchemy.String).label("domain"),
+        literal(value=None, type_=sqlalchemy.Text).label("attributes"),
     )

This all the logbook changes in 2021.6 so it seems like something with the recorder or the machine itself

bdraco commented 3 years ago

Can you post a py-spy dump taken when the logbook query is running?

KennethLavrsen commented 3 years ago

You have a step by step how to install and run pyspy, on an Hassos installation? I am weak on Docker and Alpine and my coding skills are mainly “Stoneage” C. I found a forum discussion from last year with different methods. What is the 2021 best way?

ref https://community.home-assistant.io/t/python3-high-cpu-usage/160012/86

If you read my notes above you will see that

1 same issue is seen not only in logbook but also when you click on an entity in UI. It takes up to 7-8 seconds to load the maybe 10-20 changes listed. 2 people I talked to on Discord last night did not see the problem if they did not use the default SQLite 3 a guy reported same problem as I, on a different and faster platform which makes a machine problem not very likely

The pointer right now leads to code reading from SQLite. What changed there?

KennethLavrsen commented 3 years ago

I tried the install of py-spy based on the steps in forum post and it fails because of missing priviledge and now my production container is full of some failed garbage.

This is the error

error: failed to run custom build command for `memchr v2.4.0`

Caused by:
  could not execute process `/tmp/cargo-installHatQpk/release/build/memchr-b5943f20292f2b45/build-script-build` (never executed)

Caused by:
  Permission denied (os error 13)
warning: build failed, waiting for other jobs to finish...
error: failed to compile `py-spy v0.3.7`, intermediate artifacts can be found at `/tmp/cargo-installHatQpk`

Caused by:
  build failed

And I also wonder. Is this so difficult to reproduce? The steps are - click on logbook and wait for 2 minutes instead of 2 seconds. Only thing is - you must use the default SQLite database. But if this gets assigned to the SQLite dev then he must have that environment already

KennethLavrsen commented 3 years ago

I gave up on py-spy. Some friendly soul on Discord suggested the profiler integration and that was easy to install and run

I have no clue how to view the data so I just uploaded it hoping there are no secrets hidden in the file

profile.zip

bdraco commented 3 years ago

Unfortunately the profile will only show the main thread and this will be happening in the executor

Does this help?

https://developers.home-assistant.io/docs/operating-system/debugging/

bdraco commented 3 years ago

Also the profile should only contain python code paths so nothing sensitive

bdraco commented 3 years ago

If there is a wheel for your arch, you can grab it from https://github.com/benfred/py-spy/releases/tag/v0.3.7

Then do

pip3 install --upgrade FILENAME

KennethLavrsen commented 3 years ago

I just downgraded my installation again.

Please bdraco. Can you please try yourself with the SQlite database. You do not say if you have tried it yourself. The action to reproduce is to click on the logbook link in the left menu. There are no steps. One click. The slow response is so obvious. To solve the issue you will need to change back to the default SQLite database. We know it is SQLite users (which is the default out of box database) that creates the problem.

bdraco commented 3 years ago

I have 5 instances using SQLite without any issues

KennethLavrsen commented 3 years ago

OK. I will try again and now also spin up a ghost installation on a VM so I do not upset my wife with all the downtime

KennethLavrsen commented 3 years ago

I have a VM running on an i5. KVM/QEMU. Installed the qcow2 from HA home page and had it running. With that all looks normal and no logbook issue because you only have maybe 100 records of data.

Then I restored a snapshot from my production machine, renamed it, made sure it had unique IP etc and disabled Z-wave because I have no USB dongle naturally. And then I get the same trouble as on the HA Blue. Exact same long load times from the DB.

Then I stared removing stuff. First anything HACS related. I have only a few. Miele and Alexa media. I removed them and also removed HACS. Rebooted. No difference. Then I removed integrations one by one until only Supervisor was left. Still same long response time. It boots fast but logbook still broken. I even removed all addons (also here I have very few).

So now we know that

Now I restore the VM back to snap before I ripped it apart and will try the py spy with the latest instructions.

Can a DB from a foreign machine be used for testing? I could make it available for download to a trusted Dev.

KennethLavrsen commented 3 years ago

I cannot install PySpy. I get OK that latest instruction also fails. I get that the precompiled wheel is not supported by this platform. I run on an X86_64. and I get ERROR: py_spy-0.3.7-py2.py3-none-manylinux1_x86_64.whl is not a supported wheel on this platform.

I am logged into the host (via local terminal). I am root. I run docker exec -it homeassistant /bin/bash I am in the container. Protected mode is disabled. And I do uname -a and get Linux homeassistant 5.4.109 #1 SMP Thu Apr 1 15:55:10 UTC 2021 x86_64 Linux So I am on x86_64 Linux in the VM. And I get

ERROR: py_spy-0.3.7-py2.py3-none-manylinux1_x86_64.whl is not a supported wheel on this platform.

I also tried the i686 file. Same result. I am stuck. I cannot get any further.

KennethLavrsen commented 3 years ago

One very strange thing.

I told you that with very little data the loading goes fast.

It is just past midnight and suddenly the logbook is fast loading. Also the user interface when you click on a motion sensor and see the history. It comes up normal

The list shows the entries from before midnight.

But here is an important detail.

The listing has a visible bug - look at this screen shot

image

Every entry from before midnight has the bold date header.

There is a connection. When this date header bug is seen and there is little data from after midnight - the data loads normally.

Is there a connection with dates? Or date formats in the database?

bdraco commented 3 years ago

Do you have any tod binary sensors ?

KennethLavrsen commented 3 years ago

Yes

Yes. I stopped using them because it is broken (incorrect after reboot at midnight, still unresolved). I kept them because of the bug report so I can confirm to devs when it is fixed

- platform: tod
  name: Night Time TOD
  after: '23:00'
  before: '06:30'

- platform: tod
  name: Darkness TOD
  after: sunset
  before: sunrise

- platform: tod
  name: Night Time Before Midnight
  after: '23:00'
  before: '00:00'

- platform: tod
  name: Night Time After Midnight
  after: '00:00'
  before: '06:30'

- platform: tod
  name: Darkness Before Midnight
  after: sunset
  before: '00:00'

- platform: tod
  name: Darkness After Midnight
  after: '00:00'
  before: sunrise

I did notice in my VM when I ran top that one of the two cores was at 100%

So this may be a performance issue caused by a runaway TOD integration?

I will play with that when I get out of bed tomorrow (writing this on ipad in bed)

bdraco commented 3 years ago

If you disable them does this issue go away?

KennethLavrsen commented 3 years ago

Right now the issue is still gone. It has something to do with time of day. I just checked the top again (got out of bed - too excited). The CPU is not loaded now.

Will experiment more with this tomorrow. Will disable them now. I use template sensors now for time of day. The TOD is unreliable

KennethLavrsen commented 3 years ago

OK. I had to do try before I could sleep.

I upgrade the HA Blue. And it loads everything fast when the TOD sensors are removed.

It also cures another bug which is that streaming camera can stream again. Miracle cure.

One thing remains. The screenshot where 4 June 2021 is shown before every entry before midnight is there and it is not in 2021.5.5. I guess that needs its own bug report.

I will check once more when I wake up and if it still works I will close this report. Thanks a lot.

bdraco commented 3 years ago

Here is the fix for the tod cpu drain: https://github.com/home-assistant/core/pull/51491

It looks like there are a few more issues with tod. This change doesn't attempt to address them as they need to be handled in separate PRs.

KennethLavrsen commented 3 years ago

Confirmed this morning that with TODs removed the Logbook loads quick and like before. So I can close this bug