Severe data corruption (noticed due to failing backups)

tamasgal commented 2 years ago

Expected Behavior

Update: it started with a failed backup when running the usual php /var/www/html/artisan snipeit:backup. It worked fine for a couple weeks with a daily backup and suddenly it stopped working, apparently due to database corruption (see below).

Current Behavior

A MySQL error is shown and no backup is created. The same error is shown for queries when accessing some columns in the action_date table. I also observe couple of NULL entries in working queries, which are probably related to the database corruption.

Backup failed because The dump process failed with exitcode 2 : Misuse of shell builtins : mysqldump: Couldn't execute 'SELECT /*!40001 SQL_NO_
CACHE */ `id`, `user_id`, `action_type`, `target_id`, `target_type`, `location_id`, `note`, `filename`, `item_type`, `item_id`, `expected_check
in`, `accepted_id`, `created_at`, `updated_at`, `deleted_at`, `thread_id`, `company_id`, `accept_signature`, `log_meta`, `action_date` FROM `ac
tion_logs`': Lost connection to MySQL server during query (2013)

Steps to Reproduce

Connect to the running container
Run php /var/www/html/artisan snipeit:backup

Environment

OS: Debian Buster CPU architecture: arm64 How docker service was installed: Official Docker repository

Command used to create docker container (run/create/compose/screenshot)

docker stack deploy ... (running in the swarm)

Running linuxserver/snipe-it:5.3.6 and linuxserver/mariadb:10.5.12

version: "3.2"
services:
  mysql:
    image: linuxserver/mariadb:10.5.12
    networks:
      - snipe-it
    volumes:
      - data:/config
    deploy:
      replicas: 1
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Berlin
      - MYSQL_ROOT_PASSWORD=xxx
      - MYSQL_USER=snipe
      - MYSQL_PASSWORD=xxx
      - MYSQL_DATABASE=snipe
      - DB_HOST=mysql

  snipeit:
    image: linuxserver/snipe-it:5.3.6
    networks:
      - snipe-it
    depends_on:
      - mysql
    volumes:
      - data:/config
    deploy:
    deploy:
      mode: replicated
      replicas: 1
      endpoint_mode: dnsrr
    environment:
      - APP_TRUSTED_PROXIES=x.x.x.x
      - APP_URL=http://...
      - DB_HOST=mysql
      - MYSQL_PORT_3306_TCP_ADDR=mysql
      - MYSQL_PORT_3306_TCP_PORT=3306
      - MYSQL_DATABASE=snipe
      - MYSQL_USER=snipe
      - MYSQL_PASSWORD=xxx
      - PGID=1000
      - PUID=1000

networks:
  snipe-it:
    external: true

volumes:
 data:
   driver: local
   driver_opts:
     type: "nfs4"
     o: addr=x.x.x.x,nolock,soft,rw
     device: ":/export/nfsshare/ecap-inventory"

Docker logs

Nothing in the logs:

MySQL

ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [cont-init.d] 10-adduser: exited 0.
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [cont-init.d] 30-config: executing...
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [cont-init.d] 30-config: exited 0.
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [cont-init.d] 40-initialise-db: executing...
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [cont-init.d] 40-initialise-db: exited 0.
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [cont-init.d] 90-custom-folders: executing...
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [cont-init.d] 90-custom-folders: exited 0.
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [cont-init.d] 99-custom-files: executing...
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [custom-init] no custom files found exiting...
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [cont-init.d] 99-custom-files: exited 0.
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [cont-init.d] done.
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [services.d] starting services
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | [services.d] done.
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | 220210 17:06:46 mysqld_safe Logging to '/config/databases/762c45c8cad2.err'.
ecap-inventory_mysql.1.tfc2xozzssvf@pi1080    | 220210 17:06:46 mysqld_safe Starting mariadbd daemon with databases from /config/databases

Snipe-IT

ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] 10-adduser: exited 0.
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] 20-config: executing...
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] 20-config: exited 0.
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] 30-keygen: executing...
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | using keys found in /config/keys
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] 30-keygen: exited 0.
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] 40-config: executing...
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | Nothing to migrate.
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] 40-config: exited 0.
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] 90-custom-folders: executing...
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] 90-custom-folders: exited 0.
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] 99-custom-files: executing...
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [custom-init] no custom files found exiting...
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] 99-custom-files: exited 0.
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [cont-init.d] done.
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [services.d] starting services
ecap-inventory_snipeit.1.zl2u6kuf11jy@pi1080    | [services.d] done.

github-actions[bot] commented 2 years ago

Thanks for opening your first issue here! Be sure to follow the bug or feature issue templates!

tamasgal commented 2 years ago

I found in the logs that the mysqld service is crashing every time the backup query is done.

Below is the full log.

I also found a potentially relevant stack overflow discussion here: https://stackoverflow.com/a/24555191/531222 and I thought it might be related to the max_allowed_packet variable, however it is already set to 16M and the other suggestions did not work.

Running the query directly on the MySQL REPL gives the same error (lost connection):

MariaDB [(none)]> use snipe
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [snipe]> SELECT /*!40001 SQL_NO_CACHE */ `id`, `user_id`, `action_type`, `target_id`, `target_type`, `location_id`, `note`, `filename`, `item_type`, `item_id`, `expected_checkin`, `accepted_id`, `created_at`, `updated_at`, `deleted_at`, `thread_id`, `company_id`, `accept_signature`, `log_meta`, `action_date` FROM `action_logs`;
ERROR 2013 (HY000): Lost connection to MySQL server during query
MariaDB [snipe]>

Here is the full log, but I can't find any hints. Does anyone have a clue? I am considering updating the MariaDB but I don't want to lose data...

2022-02-13 21:13:09 0 [Note] InnoDB: Uses event mutexes
2022-02-13 21:13:09 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-02-13 21:13:09 0 [Note] InnoDB: Number of pools: 1
2022-02-13 21:13:09 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-02-13 21:13:09 0 [Note] mariadbd: O_TMPFILE is not supported on /var/tmp (disabling future attempts)
2022-02-13 21:13:09 0 [Note] InnoDB: Using Linux native AIO
2022-02-13 21:13:09 0 [Note] InnoDB: Initializing buffer pool, total size = 268435456, chunk size = 134217728
2022-02-13 21:13:09 0 [Note] InnoDB: Completed initialization of buffer pool
2022-02-13 21:13:09 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=6924440,6924440
2022-02-13 21:13:09 0 [Note] InnoDB: Last binlog file '/config/log/mysql/mariadb-bin.000165', position 5784
2022-02-13 21:13:09 0 [Note] InnoDB: 128 rollback segments are active.
2022-02-13 21:13:09 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2022-02-13 21:13:09 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-02-13 21:13:09 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-02-13 21:13:09 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-02-13 21:13:09 0 [Note] InnoDB: 10.5.12 started; log sequence number 6924584; transaction id 9604
2022-02-13 21:13:09 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-02-13 21:13:09 0 [Note] InnoDB: Loading buffer pool(s) from /config/databases/ib_buffer_pool
2022-02-13 21:13:09 0 [Note] Recovering after a crash using /config/log/mysql/mariadb-bin
2022-02-13 21:13:09 0 [Note] Starting crash recovery...
2022-02-13 21:13:09 0 [Note] Crash recovery finished.
2022-02-13 21:13:09 0 [Note] Server socket created on IP: '::'.
2022-02-13 21:13:09 0 [Note] Reading of all Master_info entries succeeded
2022-02-13 21:13:09 0 [Note] Added new Master_info '' to hash table
2022-02-13 21:13:09 0 [Note] /usr/bin/mariadbd: ready for connections.
Version: '10.5.12-MariaDB-log'  socket: '/run/mysqld/mysqld.sock'  port: 3306  MariaDB Server
2022-02-13 21:13:09 0 [Note] InnoDB: Buffer pool(s) load completed at 220213 21:13:09
2022-02-13 21:13:42 4 [Warning] IP address '10.0.3.6' could not be resolved: Name does not resolve
220213 21:13:42 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.5.12-MariaDB-log
key_buffer_size=134217728
read_buffer_size=2097152
max_used_connections=2
max_threads=102
thread_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 760332 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7efff4603cd8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
Cannot determine thread, fp=0xb, backtrace may not be correct.
Bogus stack limit or frame pointer, fp=0xb, stack_bottom=0x7efff4560000, thread_stack=299008, aborting backtrace.

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7efff44f41f0): SELECT /*!40001 SQL_NO_CACHE */ `id`, `user_id`, `action_type`, `target_id`, `target_type`, `location_id`, `note`, `filename`, `item_type`, `item_id`, `expected_checkin`, `accepted_id`, `created_at`, `updated_at`, `deleted_at`, `thread_id`, `company_id`, `accept_signature`, `log_meta`, `action_date` FROM `action_logs`

Connection ID (thread ID): 5
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /config/databases
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             unlimited            unlimited            processes
Max open files            1048576              1048576              files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       512564               512564               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: core

220213 21:13:42 mysqld_safe Number of processes running now: 0
220213 21:13:42 mysqld_safe mysqld restarted
2022-02-13 21:13:42 0 [Note] /usr/bin/mariadbd (mysqld 10.5.12-MariaDB-log) starting as process 1173 ...
2022-02-13 21:13:42 0 [Note] InnoDB: Uses event mutexes
2022-02-13 21:13:42 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-02-13 21:13:42 0 [Note] InnoDB: Number of pools: 1
2022-02-13 21:13:42 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-02-13 21:13:42 0 [Note] mariadbd: O_TMPFILE is not supported on /var/tmp (disabling future attempts)
2022-02-13 21:13:42 0 [Note] InnoDB: Using Linux native AIO
2022-02-13 21:13:42 0 [Note] InnoDB: Initializing buffer pool, total size = 268435456, chunk size = 134217728
2022-02-13 21:13:42 0 [Note] InnoDB: Completed initialization of buffer pool
2022-02-13 21:13:42 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=6924440,6924440
2022-02-13 21:13:42 0 [Note] InnoDB: Last binlog file '/config/log/mysql/mariadb-bin.000165', position 5784
2022-02-13 21:13:42 0 [Note] InnoDB: 128 rollback segments are active.
2022-02-13 21:13:42 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2022-02-13 21:13:42 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-02-13 21:13:42 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-02-13 21:13:42 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2022-02-13 21:13:42 0 [Note] InnoDB: 10.5.12 started; log sequence number 6924596; transaction id 9604
2022-02-13 21:13:42 0 [Note] InnoDB: Loading buffer pool(s) from /config/databases/ib_buffer_pool
2022-02-13 21:13:42 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-02-13 21:13:42 0 [Note] Recovering after a crash using /config/log/mysql/mariadb-bin
2022-02-13 21:13:42 0 [Note] Starting crash recovery...
2022-02-13 21:13:42 0 [Note] Crash recovery finished.
2022-02-13 21:13:42 0 [Note] Server socket created on IP: '::'.
2022-02-13 21:13:42 0 [Note] Reading of all Master_info entries succeeded
2022-02-13 21:13:42 0 [Note] Added new Master_info '' to hash table
2022-02-13 21:13:42 0 [Note] /usr/bin/mariadbd: ready for connections.
Version: '10.5.12-MariaDB-log'  socket: '/run/mysqld/mysqld.sock'  port: 3306  MariaDB Server
2022-02-13 21:13:42 0 [Note] InnoDB: Buffer pool(s) load completed at 220213 21:13:42

tamasgal commented 2 years ago

I boiled it down to this query:

MariaDB [snipe]> SELECT `user_id` FROM `action_logs`;
ERROR 2013 (HY000): Lost connection to MySQL server during query

Well, it happens to many other columns too, e.g. location_id, note... but target_id for example works.

I did a quick categorisation:

WORKING: id, action_type, target_id, target_type, item_type, item_id, created_at, thread_id, company_id

CRASHING: user_id, location_id, note, filename, expected_checkin, expected_id, updated_at, deleted_at, accept_signature, log_meta, action_date

To me this looks like a severe database corruption. Now I also noticed that the Activity Logs are empty:

Screenshot 2022-02-13 at 22 29 38

mysqlcheck is not working either because the tables do not support repair:

root@762c45c8cad2:/# mysqlcheck -u root -p -r snipe action_logs
Enter password:
snipe.action_logs
note     : The storage engine for the table doesn't support repair

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tamasgal commented 2 years ago

It happened once again, database corrupt, no backups possible.

thespad commented 2 years ago

Two aspects here. First we don't support running databases from network mounts because locking and connectivity issues frequently cause corruption, and secondly you're mapping your database and snipe-it /config paths to the same volume, which is also something we don't support as we can't guarantee filename uniqueness between containers.

tamasgal commented 2 years ago

Thanks for the reply, that's already helpful. I can easily fix the paths, that's not an issue but what about the database? What is the recommended way to run this stack in a Docker swarm? I guess I need to run an external database separately?

thespad commented 2 years ago

External database or block storage.

linuxserver / docker-snipe-it