wandb / server

W&B Server is the self hosted version of Weights & Biases
MIT License
254 stars 21 forks source link

panic: dial tcp 127.0.0.1:3306 #105

Open longzilicart opened 1 year ago

longzilicart commented 1 year ago

I faced the same problem with #97, probably caused by a storage crash (NAS). The provided solution in the SQL log file points to this link: sql-innodb-recovery. The database seems corrupted (while no data was lost in this accident). Unfortunately, I did not find the SQL options file, only auto.cnf in ./WANDB/mysql/. It still seems impossible to repair the database. How to fix the database or migrate the data.

longzilicart commented 1 year ago

when running sudo docker exec -it wandb_local cat /var/log/mysql.log, i got the following message: ’‘’ 2023-01-08T02:11:42.300309Z 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery. 2023-01-08T02:11:42.300540Z 0 [ERROR] InnoDB: Page [page id: space=0, page number=553] log sequence number 24578128915 is in the future! Current system log sequence number 355841221. 2023-01-08T02:11:42.300553Z 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery. 2023-01-08T02:11:42.303802Z 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1" 2023-01-08T02:11:42.303824Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2023-01-08T02:11:42.304882Z 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2023-01-08T02:11:42.419841Z 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2023-01-08T02:11:42.421566Z 0 [Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active. 2023-01-08T02:11:42.421595Z 0 [Note] InnoDB: 32 non-redo rollback segment(s) are active. 2023-01-08T02:11:42.422027Z 0 [Note] InnoDB: Waiting for purge to start 2023-01-08 02:11:42 0x7fbb65410700 InnoDB: Assertion failure in thread 140442834372352 in file trx0purge.cc line 176 InnoDB: Failing assertion: purge_sys->iter.trx_no <= purge_sys->rseg->last_trx_no InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to http://bugs.mysql.com. InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 02:11:42 UTC - mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. Attempting to collect some information that could help diagnose the problem. As this is a crash and something is definitely wrong, the information collection process might fail.

key_buffer_size=8388608 read_buffer_size=131072 max_used_connections=0 max_threads=500 thread_count=0 connection_count=0 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 206887 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fbb50000ba0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7fbb6540fdd0 thread_stack 0x40000 mysqld(my_print_stacktrace+0x3b)[0x557d411720cb] mysqld(handle_fatal_signal+0x377)[0x557d40a00337] /lib/x86_64-linux-gnu/libpthread.so.0(+0x153c0)[0x7fbb861483c0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fbb85c3918b] /lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fbb85c18859] mysqld(+0x6a961c)[0x557d409d661c] mysqld(_ZN20TrxUndoRsegsIterator8set_nextEv+0x1578)[0x557d41479308] mysqld(+0x114f0e8)[0x557d4147c0e8] mysqld(_Z9trx_purgemmb+0x58c)[0x557d4147f75c] mysqld(srv_purge_coordinator_thread+0xb0e)[0x557d4145165e] /lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7fbb8613c609] /lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fbb85d15293]

Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0): Connection ID (thread ID): 0 Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. ‘’‘

MBakirWB commented 1 year ago

Hi @longzilicart , thanks for writing in and happy to help. I'm going to ask internally on if there is a recommended W&B approach to resolving this and get back to you by end of week.

longzilicart commented 1 year ago

Hi @MBakirWB, thank you for your reply! Any solution for recovering the data would be really helpful. I am waiting for your good news.

MBakirWB commented 1 year ago

Hi @longzilicart , spoke to our deployment team and as it is difficult to isolate the specific reason why your DB became corrupt, we cannot guarantee the below will provide a fix or preserve your data. Per the resources you referenced above you could try something like:

cat mysql_recovery.cnf

[mysqld]
innodb_force_recovery = 1

docker run --rm -d -v wandb:/vol --mount type=bind,source="$(pwd)"/mysql_recovery.cnf,target=/etc/mysql/conf.d/mysql_recovery.cnf,readonly -p 8080:8080  --name wandb-local wandb/local:latest

Then, if it works, stop the container and restart without the --mount

type=bind,source="$(pwd)"/mysql_recovery.cnf,target=/etc/mysql/conf.d/mysql_recovery.cnf,readonly option

exalate-issue-sync[bot] commented 1 year ago

WandB Internal User commented: longzilicart commented:

Hi @MBakirWB, thank you for your reply! Any solution for recovering the data would be really helpful. I am waiting for your good news.

longzilicart commented 1 year ago

Thanks a lot. I will have a try and provide more details.

exalate-issue-sync[bot] commented 1 year ago

WandB Internal User commented: longzilicart commented: Thanks a lot. I will have a try and provide more details.

longzilicart commented 1 year ago

I tried innodb_force_recovery from 1 to 3 and then ran sudo docker exec -it wandb_local cat /var/log/mysql.log. Now, I got the following message.

2023-01-19T14:07:16.331184Z 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.
2023-01-19T14:07:16.331239Z 0 [ERROR] InnoDB: Page [page id: space=1498, page number=81921] log sequence number 9039166100 is in the future! Current system log sequence number 355874682.
2023-01-19T14:07:16.331244Z 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.
2023-01-19T14:07:16.331262Z 0 [ERROR] InnoDB: Page [page id: space=1499, page number=32769] log sequence number 17172288422 is in the future! Current system log sequence number 355874682.
2023-01-19T14:07:16.331267Z 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.
2023-01-19T14:07:16.331278Z 0 [Note] InnoDB: Buffer pool(s) load completed at 230119 14:07:16
2023-01-19T14:07:16.333797Z 0 [ERROR] InnoDB: Failed to find tablespace for table `mysql`.`plugin` in the cache. Attempting to load the tablespace with space id 2
2023-01-19T14:07:16.334288Z 0 [ERROR] InnoDB: Failed to find tablespace for table `mysql`.`gtid_executed` in the cache. Attempting to load the tablespace with space id 18
2023-01-19T14:07:16.335339Z 0 [Note] Found ca.pem, server-cert.pem and server-key.pem in data directory. Trying to enable SSL support using them.
2023-01-19T14:07:16.335350Z 0 [Note] Skipping generation of SSL certificates as certificate files are present in data directory.
2023-01-19T14:07:16.335354Z 0 [Warning] A deprecated TLS version TLSv1 is enabled. Please use TLSv1.2 or higher.
2023-01-19T14:07:16.335357Z 0 [Warning] A deprecated TLS version TLSv1.1 is enabled. Please use TLSv1.2 or higher.
2023-01-19T14:07:16.335797Z 0 [Warning] CA certificate ca.pem is self signed.
2023-01-19T14:07:16.335824Z 0 [Note] Skipping generation of RSA key pair as key files are present in data directory.
2023-01-19T14:07:16.336135Z 0 [Note] Server hostname (bind-address): '*'; port: 3306
2023-01-19T14:07:16.336158Z 0 [Note] IPv6 is available.
2023-01-19T14:07:16.336163Z 0 [Note]   - '::' resolves to '::';
2023-01-19T14:07:16.336175Z 0 [Note] Server socket created on IP: '::'.
2023-01-19T14:07:16.340739Z 0 [ERROR] InnoDB: Failed to find tablespace for table `mysql`.`server_cost` in the cache. Attempting to load the tablespace with space id 19
2023-01-19T14:07:16.340865Z 0 [ERROR] InnoDB: Failed to find tablespace for table `mysql`.`engine_cost` in the cache. Attempting to load the tablespace with space id 20
2023-01-19T14:07:16.341488Z 0 [ERROR] InnoDB: Failed to find tablespace for table `mysql`.`time_zone_leap_second` in the cache. Attempting to load the tablespace with space id 12
2023-01-19T14:07:16.341608Z 0 [ERROR] InnoDB: Failed to find tablespace for table `mysql`.`time_zone_name` in the cache. Attempting to load the tablespace with space id 8
2023-01-19T14:07:16.341718Z 0 [ERROR] InnoDB: Failed to find tablespace for table `mysql`.`time_zone` in the cache. Attempting to load the tablespace with space id 9
2023-01-19T14:07:16.341830Z 0 [ERROR] InnoDB: Failed to find tablespace for table `mysql`.`time_zone_transition_type` in the cache. Attempting to load the tablespace with space id 11
2023-01-19T14:07:16.341938Z 0 [ERROR] InnoDB: Failed to find tablespace for table `mysql`.`time_zone_transition` in the cache. Attempting to load the tablespace with space id 10
2023-01-19T14:07:16.342392Z 0 [ERROR] InnoDB: Failed to find tablespace for table `mysql`.`servers` in the cache. Attempting to load the tablespace with space id 3
2023-01-19T14:07:16.346198Z 0 [Warning] Optional native table 'performance_schema'.'processlist' has the wrong structure or is missing.
2023-01-19T14:07:16.346378Z 0 [Note] Event Scheduler: Loaded 0 events
2023-01-19T14:07:16.346531Z 0 [Note] mysqld: ready for connections.
Version: '5.7.40'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server (GPL)
2023-01-19T14:07:16.418065Z 4 [ERROR] InnoDB: Failed to find tablespace for table `wandb_local`.`schema_migrations_lock` in the cache. Attempting to load the tablespace with space id 23
exec mysqld >> /var/log/mysql.log 2>&1
2023-01-19T14:07:20.193387Z 8 [ERROR] InnoDB: Failed to find tablespace for table `wandb_local`.`hs_history` in the cache. Attempting to load the tablespace with space id 1497
2023-01-19T14:07:20.207476Z 10 [Note] Aborted connection 10 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:07:20.207510Z 8 [Note] Aborted connection 8 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:07:20.207535Z 7 [Note] Aborted connection 7 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:07:20.207491Z 9 [Note] Aborted connection 9 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:08:24.141453Z 14 [Note] Aborted connection 14 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:08:24.141506Z 11 [Note] Aborted connection 11 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:08:24.141487Z 12 [Note] Aborted connection 12 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:08:24.141464Z 13 [Note] Aborted connection 13 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:08:27.981661Z 18 [Note] Aborted connection 18 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:08:27.981713Z 16 [Note] Aborted connection 16 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:08:27.981743Z 15 [Note] Aborted connection 15 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:08:27.981697Z 17 [Note] Aborted connection 17 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:09:31.885415Z 22 [Note] Aborted connection 22 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
2023-01-19T14:09:31.885456Z 20 [Note] Aborted connection 20 to db: 'wandb_local' user: 'wandb_local' host: '127.0.0.1' (Got an error reading communication packets)
longzilicart commented 1 year ago

After i delete the --mount option, i got

2023-01-19T14:52:02.804146Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2023-01-19T14:52:02.805409Z 0 [Note] mysqld (mysqld 5.7.40) starting as process 98 ...
2023-01-19T14:52:02.807943Z 0 [Note] InnoDB: PUNCH HOLE support available
2023-01-19T14:52:02.807957Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2023-01-19T14:52:02.807962Z 0 [Note] InnoDB: Uses event mutexes
2023-01-19T14:52:02.807966Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2023-01-19T14:52:02.807969Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.12
2023-01-19T14:52:02.807973Z 0 [Note] InnoDB: Using Linux native AIO
2023-01-19T14:52:02.808261Z 0 [Note] InnoDB: Number of pools: 1
2023-01-19T14:52:02.808335Z 0 [Note] InnoDB: Using CPU crc32 instructions
2023-01-19T14:52:02.809539Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2023-01-19T14:52:02.818541Z 0 [Note] InnoDB: Completed initialization of buffer pool
2023-01-19T14:52:02.820240Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2023-01-19T14:52:02.831556Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2023-01-19T14:52:02.832560Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 355442985
2023-01-19T14:52:02.836946Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 355874682
2023-01-19T14:52:02.837711Z 0 [Note] InnoDB: Database was not shutdown normally!
2023-01-19T14:52:02.837722Z 0 [Note] InnoDB: Starting crash recovery.
2023-01-19T14:52:02.838004Z 0 [ERROR] InnoDB: Tablespace 3491 was not found at ./wandb_local/analytics_events#P#from_2023_01_08_03_30_11#TMP#.ibd.
2023-01-19T14:52:02.838012Z 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace.
2023-01-19T14:52:02.838020Z 0 [ERROR] InnoDB: Tablespace 3492 was not found at ./wandb_local/analytics_events#P#future#TMP#.ibd.
2023-01-19T14:52:02.838392Z 0 [ERROR] InnoDB: Cannot continue operation.
(base) pami@pami-old:~$  sudo docker exec -it wandb-local1 cat /var/log/mysql.log
2023-01-19T14:52:02.804146Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2023-01-19T14:52:02.805409Z 0 [Note] mysqld (mysqld 5.7.40) starting as process 98 ...
2023-01-19T14:52:02.807943Z 0 [Note] InnoDB: PUNCH HOLE support available
2023-01-19T14:52:02.807957Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2023-01-19T14:52:02.807962Z 0 [Note] InnoDB: Uses event mutexes
2023-01-19T14:52:02.807966Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2023-01-19T14:52:02.807969Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.12
2023-01-19T14:52:02.807973Z 0 [Note] InnoDB: Using Linux native AIO
2023-01-19T14:52:02.808261Z 0 [Note] InnoDB: Number of pools: 1
2023-01-19T14:52:02.808335Z 0 [Note] InnoDB: Using CPU crc32 instructions
2023-01-19T14:52:02.809539Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2023-01-19T14:52:02.818541Z 0 [Note] InnoDB: Completed initialization of buffer pool
2023-01-19T14:52:02.820240Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2023-01-19T14:52:02.831556Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2023-01-19T14:52:02.832560Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 355442985
2023-01-19T14:52:02.836946Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 355874682
2023-01-19T14:52:02.837711Z 0 [Note] InnoDB: Database was not shutdown normally!
2023-01-19T14:52:02.837722Z 0 [Note] InnoDB: Starting crash recovery.
2023-01-19T14:52:02.838004Z 0 [ERROR] InnoDB: Tablespace 3491 was not found at ./wandb_local/analytics_events#P#from_2023_01_08_03_30_11#TMP#.ibd.
2023-01-19T14:52:02.838012Z 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace.
2023-01-19T14:52:02.838020Z 0 [ERROR] InnoDB: Tablespace 3492 was not found at ./wandb_local/analytics_events#P#future#TMP#.ibd.
2023-01-19T14:52:02.838392Z 0 [ERROR] InnoDB: Cannot continue operation.
exalate-issue-sync[bot] commented 1 year ago

WandB Internal User commented: longzilicart commented:

After i delete the --mount option, i got

2023-01-19T14:52:02.804146Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2023-01-19T14:52:02.805409Z 0 [Note] mysqld (mysqld 5.7.40) starting as process 98 ...
2023-01-19T14:52:02.807943Z 0 [Note] InnoDB: PUNCH HOLE support available
2023-01-19T14:52:02.807957Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2023-01-19T14:52:02.807962Z 0 [Note] InnoDB: Uses event mutexes
2023-01-19T14:52:02.807966Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2023-01-19T14:52:02.807969Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.12
2023-01-19T14:52:02.807973Z 0 [Note] InnoDB: Using Linux native AIO
2023-01-19T14:52:02.808261Z 0 [Note] InnoDB: Number of pools: 1
2023-01-19T14:52:02.808335Z 0 [Note] InnoDB: Using CPU crc32 instructions
2023-01-19T14:52:02.809539Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2023-01-19T14:52:02.818541Z 0 [Note] InnoDB: Completed initialization of buffer pool
2023-01-19T14:52:02.820240Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2023-01-19T14:52:02.831556Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2023-01-19T14:52:02.832560Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 355442985
2023-01-19T14:52:02.836946Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 355874682
2023-01-19T14:52:02.837711Z 0 [Note] InnoDB: Database was not shutdown normally!
2023-01-19T14:52:02.837722Z 0 [Note] InnoDB: Starting crash recovery.
2023-01-19T14:52:02.838004Z 0 [ERROR] InnoDB: Tablespace 3491 was not found at ./wandb_local/analytics_events#P#from_2023_01_08_03_30_11#TMP#.ibd.
2023-01-19T14:52:02.838012Z 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace.
2023-01-19T14:52:02.838020Z 0 [ERROR] InnoDB: Tablespace 3492 was not found at ./wandb_local/analytics_events#P#future#TMP#.ibd.
2023-01-19T14:52:02.838392Z 0 [ERROR] InnoDB: Cannot continue operation.
(base) pami@pami-old:~$  sudo docker exec -it wandb-local1 cat /var/log/mysql.log
2023-01-19T14:52:02.804146Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2023-01-19T14:52:02.805409Z 0 [Note] mysqld (mysqld 5.7.40) starting as process 98 ...
2023-01-19T14:52:02.807943Z 0 [Note] InnoDB: PUNCH HOLE support available
2023-01-19T14:52:02.807957Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2023-01-19T14:52:02.807962Z 0 [Note] InnoDB: Uses event mutexes
2023-01-19T14:52:02.807966Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2023-01-19T14:52:02.807969Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.12
2023-01-19T14:52:02.807973Z 0 [Note] InnoDB: Using Linux native AIO
2023-01-19T14:52:02.808261Z 0 [Note] InnoDB: Number of pools: 1
2023-01-19T14:52:02.808335Z 0 [Note] InnoDB: Using CPU crc32 instructions
2023-01-19T14:52:02.809539Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2023-01-19T14:52:02.818541Z 0 [Note] InnoDB: Completed initialization of buffer pool
2023-01-19T14:52:02.820240Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2023-01-19T14:52:02.831556Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2023-01-19T14:52:02.832560Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 355442985
2023-01-19T14:52:02.836946Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 355874682
2023-01-19T14:52:02.837711Z 0 [Note] InnoDB: Database was not shutdown normally!
2023-01-19T14:52:02.837722Z 0 [Note] InnoDB: Starting crash recovery.
2023-01-19T14:52:02.838004Z 0 [ERROR] InnoDB: Tablespace 3491 was not found at ./wandb_local/analytics_events#P#from_2023_01_08_03_30_11#TMP#.ibd.
2023-01-19T14:52:02.838012Z 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace.
2023-01-19T14:52:02.838020Z 0 [ERROR] InnoDB: Tablespace 3492 was not found at ./wandb_local/analytics_events#P#future#TMP#.ibd.
2023-01-19T14:52:02.838392Z 0 [ERROR] InnoDB: Cannot continue operation.
vanpelt commented 1 year ago

The error indicates there is invalid data on whatever mount you added. If you're able to start over, just mount a different volume, i.e. -v wandb2:/vol. You really shouldn't be doing wild DB surgery and running the database internally is only meant for quick trials. Production deployments need to be connected to an external MySQL DB.

longzilicart commented 1 year ago

After days of struggle to recover the DB, I finally gave up. An easier way is to start a new server and sync the wandb-run-folder. I use something like this.

root_dir = 
dir_list = os.listdir(root_dir)
dir_list = sorted(dir_list,  key=lambda x: os.path.getmtime(os.path.join(root_dir, x)))
for dir_name in tqdm.tqdm(dir_list, ncols = 60):
    sync_dir = add_escape(os.path.join(root_dir, dir_name, "wandb/latest-run"))
    print(sync_dir)
    os.system(f'wandb sync -e name {sync_dir}')
shicheng0829 commented 1 year ago

The container must be excuted in root user due to the write and read permissions problem.

The reason is that mysql can't start with the root user.

You can login the docker container and excute the following command.

mysqld --user=root

The solution is from here: https://stackoverflow.com/questions/25700971/fatal-error-please-read-security-section-of-the-manual-to-find-out-how-to-run

When the mysql server start, it will solve the dial problem.