home-assistant / core

:house_with_garden: Open source home automation that puts local control and privacy first.
https://www.home-assistant.io
Apache License 2.0
70.05k stars 29.12k forks source link

Error executing query: (sqlite3.DatabaseError) database disk image is malformed #121468

Open cdalexndr opened 2 weeks ago

cdalexndr commented 2 weeks ago

The problem

After some time home assistant UI is not working correctly and there are multiple errors in logs. To recover from this state, I have to manually restart the server. After restart, everything works correctly.

What version of Home Assistant Core has the issue?

core-2024.2.5

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

No response

Link to integration documentation on our website

No response

Diagnostics information

home-assistant.log.1.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2024-06-28 04:15:10.200 ERROR (Recorder) [homeassistant.components.recorder.util] Error executing query: (sqlite3.DatabaseError) database disk image is malformed
[SQL: SELECT anon_1.metadata_id, anon_1.state, anon_1.last_updated_ts, anon_1.attributes 
FROM (SELECT anon_2.metadata_id AS metadata_id, anon_2.state AS state, anon_2.last_updated_ts AS last_updated_ts, anon_2.attributes AS attributes 
FROM (SELECT states.metadata_id AS metadata_id, states.state AS state, ? AS last_updated_ts, CASE WHEN (state_attributes.shared_attrs IS NULL) THEN states.attributes ELSE state_attributes.shared_attrs END AS attributes 
FROM states JOIN (SELECT states.metadata_id AS max_metadata_id, max(states.last_updated_ts) AS max_last_updated 
FROM states 
WHERE states.last_updated_ts >= ? AND states.last_updated_ts < ? AND states.metadata_id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) GROUP BY states.metadata_id) AS anon_3 ON states.metadata_id = anon_3.max_metadata_id AND states.last_updated_ts = anon_3.max_last_updated LEFT OUTER JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id 
WHERE states.last_updated_ts >= ? AND states.last_updated_ts < ? AND states.metadata_id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)) AS anon_2 UNION ALL SELECT anon_4.metadata_id AS metadata_id, anon_4.state AS state, anon_4.last_updated_ts AS last_updated_ts, anon_4.attributes AS attributes 
FROM (SELECT states.metadata_id AS metadata_id, states.state AS state, states.last_updated_ts AS last_updated_ts, CASE WHEN (state_attributes.shared_attrs IS NULL) THEN states.attributes ELSE state_attributes.shared_attrs END AS attributes 
FROM states LEFT OUTER JOIN state_attributes ON states.attributes_id = state_attributes.attributes_id 
WHERE (states.last_changed_ts = states.last_updated_ts OR states.last_changed_ts IS NULL) AND states.metadata_id IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) AND states.last_updated_ts > ? AND states.last_updated_ts < ?) AS anon_4) AS anon_1 ORDER BY anon_1.metadata_id, anon_1.last_updated_ts]
[parameters: (0, 1718302302.213129, 1719536999.999999, 247, 267, 55, 54, 238, 395, 48, 50, 51, 52, 53, 18, 19, 20, 21, 22, 23, 235, 241, 244, 24, 340, 341, 132, 342, 136, 137, 343, 139, 141, 140, 344, 345, 346, 146, 147, 347, 348, 150, 349, 350, 153, 154, 155, 156, 351, 159 ... 120 parameters truncated ... 132, 342, 136, 137, 343, 139, 141, 140, 344, 345, 346, 146, 147, 347, 348, 150, 349, 350, 153, 154, 155, 156, 351, 159, 158, 352, 161, 162, 164, 354, 356, 355, 254, 358, 170, 171, 172, 39, 359, 360, 174, 362, 363, 260, 391, 392, 393, 394, 1719536999.999999, 1719537300.0)]
(Background on this error at: https://sqlalche.me/e/20/4xp6)

Additional information

System Information

version core-2024.2.5
installation_type Home Assistant OS
dev false
hassio true
docker true
user root
virtualenv false
python_version 3.12.1
os_name Linux
os_version 6.6.16-haos
arch x86_64
timezone Europe/Bucharest
config_dir /config
Home Assistant Cloud logged_in | false -- | -- can_reach_cert_server | ok can_reach_cloud_auth | ok can_reach_cloud | ok
Home Assistant Supervisor host_os | Home Assistant OS 12.0 -- | -- update_channel | stable supervisor_version | supervisor-2024.06.2 agent_version | 1.6.0 docker_version | 24.0.7 disk_total | 109.3 GB disk_used | 12.2 GB healthy | true supported | true board | generic-x86-64 supervisor_api | ok version_api | ok installed_addons | Node-RED (17.0.3), File editor (5.7.0), Let's Encrypt (5.0.14), ESPHome (2024.3.0), Matter Server (5.5.1), Advanced SSH & Web Terminal (17.2.0), Samba NAS (12.2.0-nas1), Network UPS Tools (0.13.0), HDD Tools (1.1.0), InfluxDB (5.0.0), Home Assistant Google Drive Backup (0.112.1)
Dashboards dashboards | 4 -- | -- resources | 0 views | 3 mode | storage
Recorder oldest_recorder_run | June 28, 2024 at 1:15 AM -- | -- current_recorder_run | July 7, 2024 at 9:24 PM estimated_db_size | 83.08 MiB database_engine | sqlite database_version | 3.44.2
cdalexndr commented 2 weeks ago

Related #118139 and #117178

cdalexndr commented 1 week ago

Done some investigation, possible hardware ssd issue. (Note the following commands are run by ssh into HAOS host, eg ssh root@IP -p 22222)

After some time /mnt/data becomes read-only (drive /dev/sdb8)

Seems there are some file system errors:

# tune2fs -l /dev/sdb8
tune2fs 1.46.5 (30-Dec-2021)
Filesystem volume name:   hassos-data
Last mounted on:          /mnt/data
Filesystem UUID:          b432dc1b-9783-4c7c-ae92-7a7327f55dc8
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              7282688
Block count:              29126769
Reserved block count:     1170285
Overhead clusters:        467553
Free blocks:              25275891
Free inodes:              6920345
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Reserved GDT blocks:      146
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Mon Dec  4 15:33:59 2023
Last mount time:          Fri Jul 12 18:35:23 2024
Last write time:          Fri Jul 12 20:14:04 2024
Mount count:              7
Maximum mount count:      -1
Last checked:             Wed Jul 10 19:52:57 2024
Check interval:           0 (<none>)
Lifetime writes:          974 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      69ce757a-4283-4998-bbf0-fda383f55458
Journal backup:           inode blocks
FS Error count:           8
First error time:         Fri Jul 12 18:35:23 2024
First error function:     ext4_validate_block_bitmap
First error line #:       421
First error err:          EFSCORRUPTED
Last error time:          Fri Jul 12 20:14:04 2024
Last error function:      ext4_journal_check_start
Last error line #:        84
Last error err:           EIO
Checksum type:            crc32c
Checksum:                 0x9b66f54a

Tried to stop docker daemon then umount and run fsck but I was not successfull:

#  systemctl stop docker
Warning: Stopping docker.service, but it can still be activated by:
  docker.socket
#
# umount /dev/sdb8
# umount /dev/sdb8
umount: /dev/sdb8: not mounted.
# fsck /dev/sdb8
fsck from util-linux 2.38
e2fsck 1.46.5 (30-Dec-2021)
/dev/sdb8 is in use.
e2fsck: Cannot continue, aborting.

What is the correct way to run fsck on HAOS?

cdalexndr commented 5 days ago

smart data shows no error on drive: smartctl.json