home-assistant / addons

:heavy_plus_sign: Docker add-ons for Home Assistant
https://home-assistant.io/hassio/
Apache License 2.0
1.56k stars 1.51k forks source link

MariaDB crash after unexpected system restart #2574

Closed Saentist closed 2 years ago

Saentist commented 2 years ago

Describe the issue you are experiencing

MariaDB crash in 95% cases after unexpected system restart recorder i set to record with:

commit_interval: 15

What type of installation are you running?

Home Assistant OS

Which operating system are you running on?

Home Assistant Operating System

Which add-on are you reporting an issue with?

MariaDB

What is the version of the add-on?

2.4.0

Steps to reproduce the issue

  1. cut the power
  2. ...

Anything in the Supervisor logs that might be useful for us?

No response

Anything in the add-on logs that might be useful for us?

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] done.
[services.d] starting services
[services.d] done.
[21:02:51] INFO: Using existing mariadb initial system
[21:02:52] INFO: Starting MariaDB
220704 21:03:00 mysqld_safe Logging to '/data/databases/mariadb.err'.
220704 21:03:01 mysqld_safe Starting mysqld daemon with databases from /data/databases
220704 21:03:01 mysqld_safe Starting mysqld daemon with databases from /data/databases
2022-07-04 21:03:06 0 [Note] /usr/bin/mysqld (mysqld 10.4.19-MariaDB) starting as process 395 ...
2022-07-04 21:03:06 0 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0% 32% 51% 80% 100% (0.0 seconds); tables to flush: 2 1 0
 (0.0 seconds); 
2022-07-04 21:03:07 0 [Note] mysqld: Aria engine: recovery done
2022-07-04 21:03:07 0 [Note] InnoDB: Using Linux native AIO
2022-07-04 21:03:07 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2022-07-04 21:03:07 0 [Note] InnoDB: Uses event mutexes
2022-07-04 21:03:07 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-07-04 21:03:07 0 [Note] InnoDB: Number of pools: 1
2022-07-04 21:03:07 0 [Note] InnoDB: Using SSE2 crc32 instructions
2022-07-04 21:03:07 0 [Note] mysqld: O_TMPFILE is not supported on /var/tmp (disabling future attempts)
2022-07-04 21:03:07 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2022-07-04 21:03:07 0 [Note] InnoDB: Completed initialization of buffer pool
2022-07-04 21:03:07 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2022-07-04 21:03:07 0 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 1173950695 between the checkpoint 1173932620 and the end 1173950464.
2022-07-04 21:03:07 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2022-07-04 21:03:07 0 [Note] InnoDB: Starting shutdown...
2022-07-04 21:03:07 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-07-04 21:03:07 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2022-07-04 21:03:07 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-07-04 21:03:08 0 [ERROR] Unknown/unsupported storage engine: InnoDB
2022-07-04 21:03:08 0 [ERROR] Aborting
220704 21:03:08 mysqld_safe mysqld from pid file /data/databases/core-mariadb.pid ended
[cont-finish.d] executing container finish scripts...
[cont-finish.d] done.
[s6-finish] waiting for services.
[s6-finish] sending all processes the TERM signal.

Additional information

After restore preverious backup log say this

220704 21:55:46 mysqld_safe Starting mysqld daemon with databases from /data/databases
2022-07-04 21:55:46 0 [Note] /usr/bin/mysqld (mysqld 10.4.19-MariaDB) starting as process 397 ...
2022-07-04 21:55:46 0 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0% 32% 51% 80% 100% (0.0 seconds); tables to flush: 2 1 0
 (0.0 seconds); 
2022-07-04 21:55:46 0 [Note] mysqld: Aria engine: recovery done
2022-07-04 21:55:46 0 [Note] InnoDB: Using Linux native AIO
2022-07-04 21:55:46 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2022-07-04 21:55:46 0 [Note] InnoDB: Uses event mutexes
2022-07-04 21:55:46 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-07-04 21:55:46 0 [Note] InnoDB: Number of pools: 1
2022-07-04 21:55:46 0 [Note] InnoDB: Using SSE2 crc32 instructions
2022-07-04 21:55:46 0 [Note] mysqld: O_TMPFILE is not supported on /var/tmp (disabling future attempts)
2022-07-04 21:55:46 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2022-07-04 21:55:46 0 [Note] InnoDB: Completed initialization of buffer pool
2022-07-04 21:55:46 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2022-07-04 21:55:46 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1036771302
2022-07-04 21:55:46 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2022-07-04 21:55:46 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2022-07-04 21:55:46 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-07-04 21:55:46 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-07-04 21:55:46 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-07-04 21:55:46 0 [Note] InnoDB: Waiting for purge to start
2022-07-04 21:55:46 0 [Note] InnoDB: 10.4.19 started; log sequence number 1036771311; transaction id 256447
2022-07-04 21:55:46 0 [Note] InnoDB: Loading buffer pool(s) from /data/databases/ib_buffer_pool
2022-07-04 21:55:46 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-07-04 21:55:46 0 [Note] Server socket created on IP: '::'.
2022-07-04 21:55:46 0 [Note] InnoDB: Buffer pool(s) load completed at 220704 21:55:46
2022-07-04 21:55:46 0 [Note] Reading of all Master_info entries succeeded
2022-07-04 21:55:46 0 [Note] Added new Master_info '' to hash table
2022-07-04 21:55:46 0 [Note] /usr/bin/mysqld: ready for connections.
Version: '10.4.19-MariaDB'  socket: '/run/mysqld/mysqld.sock'  port: 3306  MariaDB Server
[21:55:47] INFO: Check data integrity and fix corruptions
mysql.column_stats                                 OK
mysql.columns_priv                                 OK
mysql.db                                           OK
mysql.event                                        OK
mysql.func                                         OK
mysql.global_priv                                  OK
mysql.gtid_slave_pos                               OK
mysql.help_category                                OK
mysql.help_keyword                                 OK
mysql.help_relation                                OK
mysql.help_topic                                   OK
mysql.index_stats                                  OK
mysql.innodb_index_stats                           OK
mysql.innodb_table_stats                           OK
mysql.plugin                                       OK
mysql.proc                                         OK
mysql.procs_priv                                   OK
mysql.proxies_priv                                 OK
mysql.roles_mapping                                OK
mysql.servers                                      OK
mysql.table_stats                                  OK
mysql.tables_priv                                  OK
mysql.time_zone                                    OK
mysql.time_zone_leap_second                        OK
mysql.time_zone_name                               OK
mysql.time_zone_transition                         OK
mysql.time_zone_transition_type                    OK
mysql.transaction_registry                         OK
homeassistant.event_data                           OK
homeassistant.events                               OK
homeassistant.recorder_runs                        OK
homeassistant.schema_changes                       OK
homeassistant.state_attributes                     OK
homeassistant.states                               OK
homeassistant.statistics                           OK
homeassistant.statistics_meta                      OK
homeassistant.statistics_runs                      OK
homeassistant.statistics_short_term                OK
phpmyadmin.pma__bookmark                           OK
phpmyadmin.pma__central_columns                    OK
phpmyadmin.pma__column_info                        OK
phpmyadmin.pma__designer_settings                  OK
phpmyadmin.pma__export_templates                   OK
phpmyadmin.pma__favorite                           OK
phpmyadmin.pma__history                            OK
phpmyadmin.pma__navigationhiding                   OK
phpmyadmin.pma__pdf_pages                          OK
phpmyadmin.pma__recent                             OK
phpmyadmin.pma__relation                           OK
phpmyadmin.pma__savedsearches                      OK
phpmyadmin.pma__table_coords                       OK
phpmyadmin.pma__table_info                         OK
phpmyadmin.pma__table_uiprefs                      OK
phpmyadmin.pma__tracking                           OK
phpmyadmin.pma__userconfig                         OK
phpmyadmin.pma__usergroups                         OK
phpmyadmin.pma__users                              OK
[21:55:47] INFO: Ensuring internal database upgrades are performed
This installation of MariaDB is already upgraded to 10.4.19-MariaDB, use --force if you still need to run mysql_upgrade
[21:55:47] INFO: Ensure databases exists
[21:55:48] INFO: Create database homeassistant
[21:55:48] INFO: Ensure users exists and are updated
[21:55:48] INFO: Update user homeassistant
[21:55:48] INFO: Init/Update rights
[21:55:48] INFO: Granting all privileges to homeassistant on homeassistant
[21:55:49] INFO: Successfully send service information to Home Assistant.

other solution is to restore from snapshot but data loosting is 100% garanted, no mater restore type

mdegat01 commented 2 years ago

MariaDB crash in 95%

95% of what? CPU? Memory? Disk space?

Steps to reproduce the issue

  1. cut the power

Don't do that? Is that actually the steps to reproduce the issue? If so then don't do that, not a great idea. If not then please put the actual steps to reproduce.

I guess general advice here would be to list more in exclude in recorder to record less data. Especially if you have any sensors that are changing frequently in state or attributes.

Alternate options:

If you're still stuck then you could try switching to the default sqlite3 DB and stop using the MariaDB addon to see if it helps. Or you probably have to submit a ticket to MariaDB. We don't actually maintain the MariaDB software here, it's just repackaged as an addon to make it easy for HA users to install. But if you're encountering a reproducible crash with MariaDB then you'll have to follow up with their support channels.

Saentist commented 2 years ago

MariaDB crash in 95%

95% of what? CPU? Memory? Disk space?

cases of restarts

As bonus user do not have access to mariadb folders from OS itself, and cannot do nothing to manually repair DB

mdegat01 commented 2 years ago

Oh you mean HA restarts because the DB it was talking to crashed?

Ok well I'm not really sure how to help you. I haven't seen any other reports like this so doesn't seem like a systemic issue. Perhaps try deleting the addon and reinstalling fresh to see if something was corrupted and that clears it out?

Saentist commented 2 years ago

You compleatly not understand.

HA works, !@#! elecric company stop electricity. When electricity return DB is broken and no actions taken or posible to repair.

Hope this is more clear to understand

mdegat01 commented 2 years ago

Ah. You're right, I did not understand. I think in this case you would have to reach out to mariadb for support. We do have the config for mariadb set to check the DB and repair on startup but it appears that's not working for you. I have pulled the plug on my system before and never seen what you're seeing.

Another option is to get a UPS for the machine running HA. That's what I do personally to handle power outtages gracefully. If you install the NUT addon from the community add-ons repo as well then the UPS can send HA the shutdown signal before its battery dies. That way HA can gracefully shutdown before its power is cut. Doesn't have to be an expensive one either, can get a cheap one with basically just enough battery to give HA time to shutdown but not really run during an outtage.

Saentist commented 2 years ago

In the last 3 years the electric company killed 3 UPS (2x APC and EATON) current one is not supported by NUT.

My DB is 14Gb Is it possible to make some option to PUSH data sheduled to external database? Is it posible to ve added option to backup fixed time of data (day/week/month)?

https://dba.stackexchange.com/questions/163445/innodb-ignoring-the-redo-log-due-to-missing-mlog-checkpoint

Nutti85 commented 2 years ago

I have exactly the same issue running HA in a VM. If the electricity is disabled MariaDB breaks and I have to restore.

agners commented 2 years ago

This sounds like broken hardware. Any database needs to rely on underlying storage to behave according to certain minimal characteristics, e.g. in terms of write ordering or that "sync" calls actually write things to non-volatile storage. If the underlying storage does not follow these rules, then the database cannot replay its logs properly, and ends up in a corrupted state.

Saentist commented 2 years ago

What broken hardware in virtualization? My OS with run VMWARE /were is Home Assistant OS OVA/, work perfectly.

agners commented 2 years ago

What broken hardware in virtualization?

Also virtualized "hardware" can be broken: If the virtualization does not adhere to the expected semantics. Now with virtualization the whole topic gets more complex: VMware may adhere to all semantics towards the visualized machine, as long as there are no power outages on the underlying OS. E.g. if VMware claims towards the OS "yes this data has been written" while in reality it hasn't been on the actual disk.

Googling a bit about VMware and file system corruption it seems that thin provisioned disks are commonly problematic. Are you using thin provisioning on your disks?

Now I am not saying it's guaranteed not a MariaDB/OS bug... Software bugs can be there too. It just seems somewhat unlikely to me.

You can also check if it is a MariaDB Add-on/HAOS problem by forcefully power off the virtual machine. If MariaDB can recover from such "power failures", then its unlikely a MariaDB issue.

Saentist commented 2 years ago

Googling about difference in ESXI and Workstation/player product's of VMWARE

Fact 1. no access to docker folder Fact 2. no "manual" repair options at all.

mdegat01 commented 2 years ago

Fact 1. no access to docker folder Fact 2. no "manual" repair options at all.

Let's assume for a moment that you had access to the docker folder. What would you do here? If you have a clear script you use for repair in situations like this we're certainly open to adding it. Just describe your steps. Or put in a PR yourself with a new "repair" option if you know what you need the addon to do.

Saentist commented 2 years ago

What about copy DB data https://dba.stackexchange.com/questions/174/how-can-i-move-a-database-from-one-server-to-another

Old Server
Stop mysql server
Copy contents of [datadir](http://dev.mysql.com/doc/refman/5.5/en/server-options.html#option_mysqld_datadir) to another location on disk (~/mysqldata/*)
Start mysql server again (downtime was 10-15 minutes)
compress the data (tar -czvf mysqldata.tar.gz ~/mysqldata)
copy the compressed file to new server
New Server
install mysql (don't start)
unzip compressed file (tar -xzvf mysqldata.tar.gz)
move contents of mysqldata to the [datadir](http://dev.mysql.com/doc/refman/5.5/en/server-options.html#option_mysqld_datadir)
Make sure your innodb_log_file_size is same on new server, or if it's not, don't copy the old log files ([mysql will generate these](https://dba.stackexchange.com/questions/1261/how-to-safely-change-mysql-innodb-variable-innodb-log-file-size))
Start mysql
agners commented 2 years ago

I think with your know-how, and the expectation to have direct access to all services and its data directory, using Debian and maintain the system yourself using containers instead of Home Assistant OS is the way to go.

mdegat01 commented 2 years ago

@Saentist

  1. Make a backup of the MariaDB addon without a password
  2. Move the tar file of the backup to the new server
  3. Unzip the tar file
  4. Grab the datadir from within there
  5. Proceed with the remaining directions
Saentist commented 2 years ago
[07:30:05] INFO: Starting MariaDB
220711 07:30:06 mysqld_safe Logging to '/data/databases/mariadb.err'.
220711 07:30:06 mysqld_safe Starting mariadbd daemon with databases from /data/databases
220711 07:30:06 mysqld_safe Starting mariadbd daemon with databases from /data/databases
2022-07-11  7:30:11 0 [Note] /usr/bin/mariadbd (server 10.6.8-MariaDB) starting as process 247 ...
2022-07-11  7:30:13 0 [Note] mariadbd: Aria engine: starting recovery
recovered pages: 0% 10% 20% 30% 53% 63% 73% 83% 97% 100% (0.4 seconds); tables to flush: 5 4 3 2 1 0
 (0.1 seconds); 
2022-07-11  7:30:14 0 [Note] mariadbd: Aria engine: recovery done
2022-07-11  7:30:14 0 [Note] InnoDB: Compressed tables use zlib 1.2.12
2022-07-11  7:30:14 0 [Note] InnoDB: Number of pools: 1
2022-07-11  7:30:14 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-07-11  7:30:14 0 [Note] mariadbd: O_TMPFILE is not supported on /var/tmp (disabling future attempts)
2022-07-11  7:30:14 0 [Note] InnoDB: Using Linux native AIO
2022-07-11  7:30:14 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
2022-07-11  7:30:14 0 [Note] InnoDB: Completed initialization of buffer pool
2022-07-11  7:30:15 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2562742228,2564291101
2022-07-11  7:30:24 0 [Note] InnoDB: Starting final batch to recover 1933 pages from redo log.
2022-07-11  7:30:27 0 [Note] InnoDB: 128 rollback segments are active.
2022-07-11  7:30:27 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
2022-07-11  7:30:27 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-07-11  7:30:27 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-07-11  7:30:27 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-07-11  7:30:27 0 [Note] InnoDB: 10.6.8 started; log sequence number 2572622231; transaction id 331999
2022-07-11  7:30:27 0 [Note] InnoDB: Loading buffer pool(s) from /data/databases/ib_buffer_pool
2022-07-11  7:30:27 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-07-11  7:30:27 0 [Note] InnoDB: Buffer pool(s) load completed at 220711  7:30:27
2022-07-11  7:30:27 0 [Warning] 'innodb-buffer-pool-instances' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
2022-07-11  7:30:27 0 [Note] Server socket created on IP: '0.0.0.0'.
2022-07-11  7:30:27 0 [Note] Server socket created on IP: '::'.
2022-07-11  7:30:28 0 [Note] /usr/bin/mariadbd: ready for connections.
Version: '10.6.8-MariaDB'  socket: '/run/mysqld/mysqld.sock'  port: 3306  MariaDB Server
[07:30:28] INFO: Check data integrity and fix corruptions
mysql.column_stats                                 OK
mysql.columns_priv                                 OK
mysql.db                                           OK
mysql.event                                        OK
mysql.func                                         OK
mysql.global_priv                                  OK
mysql.gtid_slave_pos                               OK
mysql.help_category                                OK
mysql.help_keyword                                 OK
mysql.help_relation                                OK
mysql.help_topic                                   OK
mysql.index_stats                                  OK
mysql.innodb_index_stats                           OK
mysql.innodb_table_stats                           OK
mysql.plugin                                       OK
mysql.proc                                         OK
mysql.procs_priv                                   OK
mysql.proxies_priv                                 OK
mysql.roles_mapping                                OK
mysql.servers                                      OK
mysql.table_stats                                  OK
mysql.tables_priv                                  OK
mysql.time_zone                                    OK
mysql.time_zone_leap_second                        OK
mysql.time_zone_name                               OK
mysql.time_zone_transition                         OK
mysql.time_zone_transition_type                    OK
mysql.transaction_registry                         OK
homeassistant.event_data                           OK
homeassistant.events                               OK
homeassistant.recorder_runs                        OK
homeassistant.schema_changes                       OK
homeassistant.state_attributes                     OK
homeassistant.states                               OK
homeassistant.statistics                           OK
homeassistant.statistics_meta                      OK
homeassistant.statistics_runs                      OK
homeassistant.statistics_short_term                OK
phpmyadmin.pma__bookmark                           OK
phpmyadmin.pma__central_columns                    OK
phpmyadmin.pma__column_info                        OK
phpmyadmin.pma__designer_settings                  OK
phpmyadmin.pma__export_templates                   OK
phpmyadmin.pma__favorite                           OK
phpmyadmin.pma__history                            OK
phpmyadmin.pma__navigationhiding                   OK
phpmyadmin.pma__pdf_pages                          OK
phpmyadmin.pma__recent                             OK
phpmyadmin.pma__relation                           OK
phpmyadmin.pma__savedsearches                      OK
phpmyadmin.pma__table_coords                       OK
phpmyadmin.pma__table_info                         OK
phpmyadmin.pma__table_uiprefs                      OK
phpmyadmin.pma__tracking                           OK
phpmyadmin.pma__userconfig                         OK
phpmyadmin.pma__usergroups                         OK
phpmyadmin.pma__users                              OK
sys.sys_config                                     OK
[07:30:35] INFO: Ensuring internal database upgrades are performed
[07:30:35] INFO: Ensure databases exists
[07:30:36] INFO: Create database homeassistant
[07:30:36] INFO: Ensure users exists and are updated
[07:30:37] INFO: Update user homeassistant
[07:30:37] INFO: Init/Update rights
[07:30:37] INFO: Granting all privileges to homeassistant on homeassistant
[07:30:38] INFO: Successfully send service information to Home Assistant.

what is this, a wrong permision ot anything else?

2022-07-11  7:30:14 0 [Note] mariadbd: O_TMPFILE is not supported on /var/tmp (disabling future attempts)

as a bonus

image

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.