tarantool / tarantool

Get your data in RAM. Get compute close to data. Enjoy the performance.
https://www.tarantool.io
Other
3.36k stars 376 forks source link

Segmentation fault at database recovery #10156

Open zbyte opened 2 weeks ago

zbyte commented 2 weeks ago

Bug description

A clear and concise description of what the bug is.

Tarantool 2.11.2-0-g1bac2d257 Target: Linux-x86_64-RelWithDebInfo Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=TRUE Compiler: GNU-10.3.1 C_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -fopenmp -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/usr/src/tarantool=. -std=c11 -Wall -Wextra -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type CXX_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -fopenmp -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=/usr/src/tarantool=. -std=c++11 -Wall -Wextra -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type

Steps to reproduce

Database worked in docker was restarted via docker restart. After this it cannot recover and start.

Actual behavior

2024-06-18 09:38:45.762 [1] main/103/tarantool-entrypoint.lua I> Tarantool 2.11.2-0-g1bac2d257 Linux-x86_64-RelWithDebInfo
2024-06-18 09:38:45.762 [1] main/103/tarantool-entrypoint.lua I> log level 5
2024-06-18 09:38:45.762 [1] main/103/tarantool-entrypoint.lua I> wal/engine cleanup is paused
2024-06-18 09:38:45.762 [1] main/103/tarantool-entrypoint.lua I> mapping 268435456 bytes for memtx tuple arena...
2024-06-18 09:38:45.762 [1] main/103/tarantool-entrypoint.lua I> Actual slab_alloc_factor calculated on the basis of desired slab_alloc_factor = 1.044274
2024-06-18 09:38:45.762 [1] main/103/tarantool-entrypoint.lua I> mapping 1392508928 bytes for vinyl tuple arena...
2024-06-18 09:38:45.762 [1] main/103/tarantool-entrypoint.lua/box.upgrade I> Recovering snapshot with schema version 2.11.1
2024-06-18 09:38:45.763 [1] main/103/tarantool-entrypoint.lua I> update replication_synchro_quorum = 1
2024-06-18 09:38:45.764 [1] main/103/tarantool-entrypoint.lua I> instance uuid 4d83ea98-ec13-47df-9630-ca09ad0ff540
2024-06-18 09:38:47.059 [1] main/103/tarantool-entrypoint.lua xlog.c:2013 E> can't open tx: invalid magic: 0x0
2024-06-18 09:38:47.060 [1] main/103/tarantool-entrypoint.lua I> instance vclock {1: 16220628}
2024-06-18 09:38:47.060 [1] main/103/tarantool-entrypoint.lua I> tx_binary: bound to 0.0.0.0:3301
2024-06-18 09:38:47.060 [1] main/103/tarantool-entrypoint.lua I> recovery start
2024-06-18 09:38:47.060 [1] main/103/tarantool-entrypoint.lua I> recovering from `/var/lib/tarantool/00000000000000000139.snap'
2024-06-18 09:38:47.060 [1] main/103/tarantool-entrypoint.lua I> cluster uuid 84b62572-c7e0-4653-902e-32c0ce1c22da
2024-06-18 09:38:47.079 [1] main/103/tarantool-entrypoint.lua I> assigned id 1 to replica 4d83ea98-ec13-47df-9630-ca09ad0ff540
2024-06-18 09:38:47.079 [1] main/103/tarantool-entrypoint.lua I> update replication_synchro_quorum = 1
2024-06-18 09:38:47.079 [1] main/103/tarantool-entrypoint.lua I> recover from `/var/lib/tarantool/00000000000000000139.xlog'
Segmentation fault
  code: SEGV_MAPERR
  addr: 0x10
  context: 0x7e2ff72113c0
  siginfo: 0x7e2ff72114f0
  rax      0x0                0
  rbx      0x1                1
  rcx      0x0                0
  rdx      0x0                0
  rsi      0x7e2f91d10ff0     138742774960112
  rdi      0x7e2f91d11000     138742774960128
  rsp      0x7e2ff6f805b8     138744472012216
  rbp      0x7e2ff6474bf0     138744460430320
  r8       0x0                0
  r9       0x7e2f91d11000     138742774960128
  r10      0x6                6
  r11      0x0                0
  r12      0x7e2f91d1bc30     138742775004208
  r13      0x7e2ff6c28038     138744468504632
  r14      0x7e2ff6c28038     138744468504632
  r15      0x7e2ff6c200f0     138744468472048
  rip      0x7e2ff97bb90f     138744514197775
  eflags   0x10206            66054
  cs       0x33               51
  gs       0x0                0
  fs       0x0                0
  cr2      0x10               16
  err      0x4                4
  oldmask  0x0                0
  trapno   0xe                14
Current time: 1718703527
Please file a bug at https://github.com/tarantool/tarantool/issues
Attempting backtrace... Note: since the server has already crashed,
this may fail as well
#1  0x602e224337b9 in crash_signal_cb+153
#2  0x7e2ff97decee in sigwaitinfo+8

Expected behavior

database recovered and start

zbyte commented 2 weeks ago

archive with 00000000000000000139.xlog ~189Mb: https://drive.google.com/file/d/1iQQAEMlSaUJ_BED57gpamSk0OYQdCirC/view?usp=sharing

drewdzzz commented 2 weeks ago

It seems that you forgot to attach directories containing vinyl indexes: SystemError: failed to open './513/0/00000000000000000231.index' file: No such file or directory

zbyte commented 2 weeks ago

all directories and files were mounted. We run tarantool via docker compose and and volumes are written in the yaml.

zbyte commented 2 weeks ago

I don’t know the reason for the initial crash, the logs of this moment were not saved. Most likely the memory has run out.

drewdzzz commented 4 days ago

Backtrace from your crash report contains only two frames:

#1  0x602e224337b9 in crash_signal_cb+153
#2  0x7e2ff97decee in sigwaitinfo+8

Haven't you cropped it? Full backtrace would help a lot.

About vinyl indexes: the archive you attached contains only .snap, .xlog and .vylog files. They are not sufficient for recovery: vinyl requires subdirectories containing its indexes. These directories are named by the space id (for example, ./512). Could you provide these directories as well?