manticoresoftware / manticoresearch

Easy to use open source fast database for search | Good alternative to Elasticsearch now | Drop-in replacement for E in the ELK soon
https://manticoresearch.com
GNU General Public License v3.0
8.92k stars 494 forks source link

crash on alter table tbl add column col uint #1692

Closed xdimus closed 5 months ago

xdimus commented 9 months ago

crash on alter table tbl add column col uint

uname -a Linux dcn35 6.2.9-1-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.9-1 x86_64 GNU/Linux

------- FATAL: CRASH DUMP ------- [Thu Dec 21 10:42:44.891 2023] [143411]

--- crashed SphinxQL request dump --- alter table item add column la3 uint --- request dump end --- --- local index:characteristic_ Manticore 6.2.12 dc5144d35@230822 (columnar 2.2.4 5aec342@230822) (secondary 2.2.4 5aec342@230822) Handling signal 11 -------------- backtrace begins here --------------- Program compiled with Clang 15.0.7 Configured with flags: Configured with these definitions: -DDISTR_BUILD=bullseye -DUSE_SYSLOG=1 -DWITH_GALERA=1 -DWITH_RE2=1 -DWITH_RE2_FORCE_STATIC=1 -DWITH_STEMMER=1 -DWITH_STEMMER_FORCE_STATIC=1 -DWITH_NLJSON=1 -DWITH_UNIALGO=1 -DWITH_ICU=1 -DWITH_ICU_FORCE_STATIC=1 -DWITH_SSL=1 -DWITH_ZLIB=1 -DWITH_ZSTD=1 -DDL_ZSTD=1 -DZSTD_LIB=libzstd.so.1 -DWITH_CURL=1 -DDL_CURL=1 -DCURL_LIB=libcurl.so.4 -DWITH_ODBC=1 -DDL_ODBC=1 -DODBC_LIB=libodbc.so.2 -DWITH_EXPAT=1 -DDL_EXPAT=1 -DEXPAT_LIB=libexpat.so.1 -DWITH_ICONV=1 -DWITH_MYSQL=1 -DDL_MYSQL=1 -DMYSQL_LIB=libmariadb.so.3 -DWITH_POSTGRESQL=1 -DDL_POSTGRESQL=1 -DPOSTGRESQL_LIB=libpq.so.5 -DLOCALDATADIR=/var/lib/manticore -DFULL_SHARE_DIR=/usr/share/manticore Built on Linux x86_64 (bullseye) (cross-compiled) Stack bottom = 0x7fcd687d3850, thread stack size = 0x20000 Trying manual backtrace: Something wrong with thread stack, manual backtrace may be incorrect (fp=0x1) Wrong stack limit or frame pointer, manual backtrace failed (fp=0x1, stack=0x7fcd687d0000, stacksize=0x20000) Trying system backtrace: begin of system symbols: /usr/bin/searchd(_Z12sphBacktraceib+0x22a)[0x56006752fcfa] /usr/bin/searchd(_ZN11CrashLogger11HandleCrashEi+0x355)[0x5600673aece5] /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7fcece6a3140] /lib/x86_64-linux-gnu/libc.so.6(+0x16804d)[0x7fcece62404d] /usr/bin/searchd(_ZN18IndexAlterHelper_c26Alter_AddRemoveRowwiseAttrERK10CSphSchemaS2_PKjjPKhR14WriteWrapper_cS8_bRK10CSphString+0x293)[0x5600682a63a3] /usr/bin/searchd(_ZN13CSphIndex_VLN18AddRemoveAttributeEbRK18AttrAddRemoveCtx_tR10CSphString+0x66a)[0x56006743e55a] /usr/bin/searchd(_ZN9RtIndex_c18AddRemoveAttributeEbRK18AttrAddRemoveCtx_tR10CSphString+0x3d5)[0x560068164c35] /usr/bin/searchd(+0xea7982)[0x5600673fc982] /usr/bin/searchd(_ZN15ClientSession_c7ExecuteESt4pairIPKciER11RowBuffer_i+0x193b)[0x5600673f947b] /usr/bin/searchd(_Z20ProcessSqlQueryBuddySt4pairIPKciERhR21GenericOutputBuffer_c+0x52)[0x5600673591a2] /usr/bin/searchd(_Z8SqlServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EE+0x105d)[0x56006733e74d] /usr/bin/searchd(_Z10MultiServeSt10unique_ptrI16AsyncNetBuffer_cSt14default_deleteIS0_EESt4pairIitE7Proto_e+0x43)[0x56006733a503] /usr/bin/searchd(+0xde60b2)[0x56006733b0b2] /usr/bin/searchd(_ZZN7Threads11CoRoutine_c13CreateContextESt8functionIFvvEESt4pairIN5boost7context13stack_contextENS_14StackFlavour_EEEENUlNS6_6detail10transfer_tEE_8invokeESB_+0x1c)[0x56006867d62c] /usr/bin/searchd(make_fcontext+0x2f)[0x56006869d9cf] Trying boost backtrace: 0# sphBacktrace(int, bool) in /usr/bin/searchd 1# CrashLogger::HandleCrash(int) in /usr/bin/searchd 2# 0x00007FCECE6A3140 in /lib/x86_64-linux-gnu/libpthread.so.0 3# 0x00007FCECE62404D in /lib/x86_64-linux-gnu/libc.so.6 4# IndexAlterHelper_c::Alter_AddRemoveRowwiseAttr(CSphSchema const&, CSphSchema const&, unsigned int const, unsigned int, unsigned char const, WriteWrapper_c&, WriteWrapper_c&, bool, CSphString const&) in /usr/bin/searchd 5# CSphIndex_VLN::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in /usr/bin/searchd 6# RtIndex_c::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in /usr/bin/searchd 7# 0x00005600673FC982 in /usr/bin/searchd 8# ClientSession_c::Execute(std::pair<char const, int>, RowBuffer_i&) in /usr/bin/searchd 9# ProcessSqlQueryBuddy(std::pair<char const, int>, unsigned char&, GenericOutputBuffer_c&) in /usr/bin/searchd 10# SqlServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete >) in /usr/bin/searchd 11# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete >, std::pair<int, unsigned short>, Proto_e) in /usr/bin/searchd 12# 0x000056006733B0B2 in /usr/bin/searchd 13# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::invoke(boost::context::detail::transfer_t) in /usr/bin/searchd 14# make_fcontext in /usr/bin/searchd -------------- backtrace ends here ---------------

UPDATE

The task is to check if all attribute files exist before trying to modify attributes list in the table.

tomatolog commented 9 months ago

could you upload your table there crash happened? as described in our manual upload

sanikolaev commented 9 months ago

Can't reproduce like this:

snikolaev@dev2:~$ docker run -e EXTRA=1 --name manticore --rm -d manticoresearch/manticore:6.2.12 && echo "Waiting for Manticore docker to start. Consider mapping the data_dir to make it start faster next time" && until docker logs manticore 2>&1 | grep -q "accepting connections"; do sleep 1; echo -n .; done && echo && docker exec -it manticore mysql && docker stop manticore
d98f4810b5992dd7b56f52a6b9d9e5035a476a90d9632c60ab725ca602473593
Waiting for Manticore docker to start. Consider mapping the data_dir to make it start faster next time
....
mysql> create table item(la2 int);
mysql> alter table item add column la3 uint;
mysql> drop table item;
mysql> create table item(la2 int);
mysql> insert into item(la2) values(1),(43786487364),(343);
mysql> flush ramchunk item;
mysql> insert into item(la2) values(1),(43786487364),(343);
mysql> flush table item;
mysql> insert into item(la2) values(1),(43786487364),(343);
mysql> alter table item add column la3 uint;
mysql> desc item;
+-------+--------+------------+
| Field | Type   | Properties |
+-------+--------+------------+
| id    | bigint |            |
| la2   | uint   |            |
| la3   | uint   |            |
+-------+--------+------------+
mysql> select * from item;
+--------------------+-----------+------+
| id                 | la2       | la3  |
+--------------------+-----------+------+
| 362751603444285444 |         1 |    0 |
| 362751603444285441 |         1 |    0 |
| 362751603444285447 |         1 |    0 |
| 362751603444285442 | 836814404 |    0 |
| 362751603444285445 | 836814404 |    0 |
| 362751603444285448 | 836814404 |    0 |
| 362751603444285446 |       343 |    0 |
| 362751603444285449 |       343 |    0 |
| 362751603444285443 |       343 |    0 |
+--------------------+-----------+------+

We need something to reproduce it locally to fix the crash: the data files or a way to recreate the table from scratch.

xdimus commented 9 months ago

crashes only with non-empty table, if I'm truncate it no crashes

tomatolog commented 8 months ago

we need reproducible example to investigate the crash further, ie you could upload your table that cause the crash as as described in our manual upload section or provide MRE with the CREATE TABLE to create table structure then INSERT to populate table and ALTER statement that cause the crash

xdimus commented 8 months ago

I uploaded files

sanikolaev commented 8 months ago

Thanks. I've reproduced the crash:

 0# sphBacktrace(int, bool) in searchd
 1# CrashLogger::HandleCrash(int) in searchd
 2# 0x00007FA911FFB520 in /lib/x86_64-linux-gnu/libc.so.6
 3# 0x00007FA912159741 in /lib/x86_64-linux-gnu/libc.so.6
 4# IndexAlterHelper_c::Alter_AddRemoveRowwiseAttr(CSphSchema const&, CSphSchema const&, unsigned int const*, unsigned int, unsigned char const*, WriteWrapper_c&, WriteWrapper_c&, bool, CSphString const&) in searchd
 5# CSphIndex_VLN::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in searchd
 6# RtIndex_c::AddRemoveAttribute(bool, AttrAddRemoveCtx_t const&, CSphString&) in searchd
 7# ClientSession_c::Execute(std::pair<char const*, int>, RowBuffer_i&) in searchd
 8# SqlServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in searchd
 9# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in searchd
10# 0x0000560665768542 in searchd
11# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::__invoke(boost::context::detail::transfer_t) in searchd
12# make_fcontext in searchd

But the point is that even before the ALTER indextool reports:

FAILED, unable to open attributes: failed to open datadir/active_item/active_item.65.spa: No such file or directory

so, the table is corrupted. If I remove chunk 65, I can do:

mysql> alter table active_item add column col uint;
Query OK, 0 rows affected (11.43 sec)

fine

mysql> desc active_item;
...
| col                        | uint   |                |

and the table is not corrupted after that:

snikolaev@dev2:~/issue-1692$ indextool -c manticore.conf --check active_item
...
snikolaev@dev2:~/issue-1692$ echo $?
0

So the ALTER crashes due to the corrupted table.

What we can try to do in this specific case is to check if all attribute files exist before trying to modifying attributes list in the table.

xdimus commented 8 months ago

If I remove chunk 65, I can do:

How to correctly delete a bad chunk?

tomatolog commented 8 months ago

the correct way is to truncate table then reindex data from scratch

xdimus commented 8 months ago

it is too long, can i delete only bad chank, alter the table then re-fill it?

sanikolaev commented 8 months ago

In theory yes. You can delete it by:

Then you can re-insert data from the bad chunk.

sanikolaev commented 5 months ago

@klirichek klirichek closed this as completed in https://github.com/manticoresoftware/manticoresearch/commit/b7c33847f14bbcd2605b56e9b6c14406e932b675 3 days ago

Reopening as it turns out with this change a few columnar tests don't pass - https://github.com/manticoresoftware/columnar/runs/24096217755