Closed xdimus closed 5 months ago
could you upload your table there crash happened? as described in our manual upload
Can't reproduce like this:
snikolaev@dev2:~$ docker run -e EXTRA=1 --name manticore --rm -d manticoresearch/manticore:6.2.12 && echo "Waiting for Manticore docker to start. Consider mapping the data_dir to make it start faster next time" && until docker logs manticore 2>&1 | grep -q "accepting connections"; do sleep 1; echo -n .; done && echo && docker exec -it manticore mysql && docker stop manticore
d98f4810b5992dd7b56f52a6b9d9e5035a476a90d9632c60ab725ca602473593
Waiting for Manticore docker to start. Consider mapping the data_dir to make it start faster next time
....
mysql> create table item(la2 int);
mysql> alter table item add column la3 uint;
mysql> drop table item;
mysql> create table item(la2 int);
mysql> insert into item(la2) values(1),(43786487364),(343);
mysql> flush ramchunk item;
mysql> insert into item(la2) values(1),(43786487364),(343);
mysql> flush table item;
mysql> insert into item(la2) values(1),(43786487364),(343);
mysql> alter table item add column la3 uint;
mysql> desc item;
+-------+--------+------------+
| Field | Type | Properties |
+-------+--------+------------+
| id | bigint | |
| la2 | uint | |
| la3 | uint | |
+-------+--------+------------+
mysql> select * from item;
+--------------------+-----------+------+
| id | la2 | la3 |
+--------------------+-----------+------+
| 362751603444285444 | 1 | 0 |
| 362751603444285441 | 1 | 0 |
| 362751603444285447 | 1 | 0 |
| 362751603444285442 | 836814404 | 0 |
| 362751603444285445 | 836814404 | 0 |
| 362751603444285448 | 836814404 | 0 |
| 362751603444285446 | 343 | 0 |
| 362751603444285449 | 343 | 0 |
| 362751603444285443 | 343 | 0 |
+--------------------+-----------+------+
We need something to reproduce it locally to fix the crash: the data files or a way to recreate the table from scratch.
crashes only with non-empty table, if I'm truncate it no crashes
we need reproducible example to investigate the crash further, ie you could upload your table that cause the crash as as described in our manual upload section or provide MRE with the CREATE TABLE
to create table structure then INSERT
to populate table and ALTER
statement that cause the crash
I uploaded files
Thanks. I've reproduced the crash:
0# sphBacktrace(int, bool) in searchd
1# CrashLogger::HandleCrash(int) in searchd
2# 0x00007FA911FFB520 in /lib/x86_64-linux-gnu/libc.so.6
3# 0x00007FA912159741 in /lib/x86_64-linux-gnu/libc.so.6
4# IndexAlterHelper_c::Alter_AddRemoveRowwiseAttr(CSphSchema const&, CSphSchema const&, unsigned int const*, unsigned int, unsigned char const*, WriteWrapper_c&, WriteWrapper_c&, bool, CSphString const&) in searchd
5# CSphIndex_VLN::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in searchd
6# RtIndex_c::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in searchd
7# ClientSession_c::Execute(std::pair<char const*, int>, RowBuffer_i&) in searchd
8# SqlServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in searchd
9# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in searchd
10# 0x0000560665768542 in searchd
11# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in searchd
12# make_fcontext in searchd
But the point is that even before the ALTER indextool reports:
FAILED, unable to open attributes: failed to open datadir/active_item/active_item.65.spa: No such file or directory
so, the table is corrupted. If I remove chunk 65, I can do:
mysql> alter table active_item add column col uint;
Query OK, 0 rows affected (11.43 sec)
fine
mysql> desc active_item;
...
| col | uint | |
and the table is not corrupted after that:
snikolaev@dev2:~/issue-1692$ indextool -c manticore.conf --check active_item
...
snikolaev@dev2:~/issue-1692$ echo $?
0
So the ALTER crashes due to the corrupted table.
What we can try to do in this specific case is to check if all attribute files exist before trying to modifying attributes list in the table.
If I remove chunk 65, I can do:
How to correctly delete a bad chunk?
the correct way is to truncate table then reindex data from scratch
it is too long, can i delete only bad chank, alter the table then re-fill it?
In theory yes. You can delete it by:
Then you can re-insert data from the bad chunk.
@klirichek klirichek closed this as completed in https://github.com/manticoresoftware/manticoresearch/commit/b7c33847f14bbcd2605b56e9b6c14406e932b675 3 days ago
Reopening as it turns out with this change a few columnar tests don't pass - https://github.com/manticoresoftware/columnar/runs/24096217755
crash on alter table tbl add column col uint
uname -a Linux dcn35 6.2.9-1-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.9-1 x86_64 GNU/Linux
------- FATAL: CRASH DUMP ------- [Thu Dec 21 10:42:44.891 2023] [143411]
--- crashed SphinxQL request dump --- alter table item add column la3 uint --- request dump end --- --- local index:characteristic_ Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822) Handling signal 11 -------------- backtrace begins here --------------- Program compiled with Clang 15.0.7 Configured with flags: Configured with these definitions: -DDISTR_BUILD=bullseye -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore Built on Linux x86_64 (bullseye) (cross-compiled) Stack bottom = 0x7fcd687d3850, thread stack size = 0x20000 Trying manual backtrace: Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1) Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7fcd687d0000, stacksize=0x20000) Trying system backtrace: begin of system symbols: /usr/bin/searchd(_Z12sphBacktraceib+0x22a)[0x56006752fcfa] /usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x355)[0x5600673aece5] /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7fcece6a3140] /lib/x86_64-linux-gnu/libc.so.6(+0x16804d)[0x7fcece62404d] /usr/bin/searchd(_ZN18IndexAlterHelper_c26Alter_AddRemoveRowwiseAttrERK10CSphSchemaS2_PKjjPKhR14WriteWrapper_cS8_bRK10CSphString+0x293)[0x5600682a63a3] /usr/bin/searchd(_ZN13CSphIndex_VLN18AddRemoveAttributeEbRK18AttrAddRemoveCtx_tR10CSphString+0x66a)[0x56006743e55a] /usr/bin/searchd(_ZN9RtIndex_c18AddRemoveAttributeEbRK18AttrAddRemoveCtx_tR10CSphString+0x3d5)[0x560068164c35] /usr/bin/searchd(+0xea7982)[0x5600673fc982] /usr/bin/searchd(_ZN15ClientSession_c7ExecuteESt4pairIPKciER11RowBuffer_i+0x193b)[0x5600673f947b] /usr/bin/searchd(_Z20ProcessSqlQueryBuddySt4pairIPKciERhR21GenericOutputBuffer_c+0x52)[0x5600673591a2] /usr/bin/searchd(_Z8SqlServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE+0x105d)[0x56006733e74d] /usr/bin/searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e+0x43)[0x56006733a503] /usr/bin/searchd(+0xde60b2)[0x56006733b0b2] /usr/bin/searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8invokeESB_+0x1c)[0x56006867d62c] /usr/bin/searchd(make_fcontext+0x2f)[0x56006869d9cf] Trying boost backtrace: 0# sphBacktrace(int, bool) in /usr/bin/searchd 1# CrashLogger::HandleCrash(int) in /usr/bin/searchd 2# 0x00007FCECE6A3140 in /lib/x86_64-linux-gnu/libpthread.so.0 3# 0x00007FCECE62404D in /lib/x86_64-linux-gnu/libc.so.6 4# IndexAlterHelper_c::Alter_AddRemoveRowwiseAttr(CSphSchema const&, CSphSchema const&, unsigned int const, unsigned int, unsigned char const, WriteWrapper_c&, WriteWrapper_c&, bool, CSphString const&) in /usr/bin/searchd 5# CSphIndex_VLN::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in /usr/bin/searchd 6# RtIndex_c::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in /usr/bin/searchd 7# 0x00005600673FC982 in /usr/bin/searchd 8# ClientSession_c::Execute(std::pair<char const, int>, RowBuffer_i&) in /usr/bin/searchd 9# ProcessSqlQueryBuddy(std::pair<char const, int>, unsigned char&, GenericOutputBuffer_c&) in /usr/bin/searchd 10# SqlServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete >) in /usr/bin/searchd
11# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete >, std::pair<int, unsigned short>, Proto_e) in /usr/bin/searchd
12# 0x000056006733B0B2 in /usr/bin/searchd
13# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}:: invoke(boost::context::detail::transfer_t) in /usr/bin/searchd
14# make_fcontext in /usr/bin/searchd
-------------- backtrace ends here ---------------
UPDATE
The task is to check if all attribute files exist before trying to modify attributes list in the table.