manticoresoftware / columnar

Manticore Columnar Library
Apache License 2.0
82 stars 15 forks source link

crash after 709b9aca #31

Closed sanikolaev closed 1 year ago

sanikolaev commented 1 year ago

https://github.com/manticoresoftware/columnar/commit/709b9acaaac97d9a1ca8796892f9ad432021c785 leads to a crash which can be reproduced so:

snikolaev@dev2:~$ cat configless.conf
searchd {
   listen = 127.0.0.1:9315:mysql
   listen = 127.0.0.1:9316:http
   data_dir = data
   pid_file = 9315.pid
   log = searchd.log
   binlog_path =
}

snikolaev@dev2:~$ ~/manticore_github/build/src/searchd -c configless.conf
Manticore 6.2.13 f94555a29@230908 dev (columnar 2.2.5 709b9ac@230908) (secondary 2.2.5 709b9ac@230908)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[56:31.368] [3994359] using config file '/home/snikolaev/configless.conf' (158 chars)...
starting daemon version '6.2.13 f94555a29@230908 dev (columnar 2.2.5 709b9ac@230908) (secondary 2.2.5 709b9ac@230908)' ...
listening on 127.0.0.1:9315 for mysql
listening on 127.0.0.1:9316 for sphinx and http(s)

snikolaev@dev2:~$ ~/manticore_github/test/clt-tests/mysqldump/scripts/generate-1m-records.sh |mysql -P9315 -h0

snikolaev@dev2:~$ mysql -P9315 -h0
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 6.2.13 f94555a29@230908 dev (columnar 2.2.5 709b9ac@230908) (secondary 2.2.5 709b9ac@230908) git branch HEAD (no branch)

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select * from t where s = 'Psm' order by id asc limit 20;
ERROR 2013 (HY000): Lost connection to MySQL server during query

backtrace:

 0# sphBacktrace(int, bool) in /home/snikolaev/manticore_github/build/src/searchd
 1# CrashLogger::HandleCrash(int) in /home/snikolaev/manticore_github/build/src/searchd
 2# 0x00007FA7029CC520 in /lib/x86_64-linux-gnu/libc.so.6
 3# 0x000055690504835A in /home/snikolaev/manticore_github/build/src/searchd
 4# bool CSphMatchQueue<MatchGeneric1_fn, false>::PushT<CSphMatch const&, CSphMatchQueue<MatchGeneric1_fn, false>::Push(CSphMatch const&)::{lambda(CSphMatch&, CSphMatch const&)#1}>(CSphMatch const&, CSphMa
tchQueue<MatchGeneric1_fn, false>::Push(CSphMatch const&)::{lambda(CSphMatch&, CSphMatch const&)#1}&&) in /home/snikolaev/manticore_github/build/src/searchd
 5# CSphMatchQueue<MatchGeneric1_fn, false>::Push(CSphMatch const&) in /home/snikolaev/manticore_github/build/src/searchd
 6# 0x0000556904F26299 in /home/snikolaev/manticore_github/build/src/searchd
 7# CSphIndex_VLN::RunFullscanOnIterator(RowidIterator_i*, CSphQueryContext const&, CSphQueryResultMeta&, VecTraits_T<ISphMatchSorter*> const&, CSphMatch&, int, bool, int, long) const in /home/snikolaev/ma
nticore_github/build/src/searchd
 8# CSphIndex_VLN::MultiScan(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&, long) const in /home/snikolaev/manticore_github/build/src/searchd
 9# CSphIndex_VLN::MultiQuery(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&) const in /home/snikolaev/manticore_github/build/src/searchd
10# 0x0000556905CE4353 in /home/snikolaev/manticore_github/build/src/searchd
11# 0x0000556905E3100E in /home/snikolaev/manticore_github/build/src/searchd
12# Threads::Coro::ExecuteN(int, std::function<void ()>&&) in /home/snikolaev/manticore_github/build/src/searchd
13# RtIndex_c::MultiQuery(CSphQueryResult&, CSphQuery const&, VecTraits_T<ISphMatchSorter*> const&, CSphMultiQueryArgs const&) const in /home/snikolaev/manticore_github/build/src/searchd
14# 0x0000556904DFEE05 in /home/snikolaev/manticore_github/build/src/searchd
15# 0x0000556905E3100E in /home/snikolaev/manticore_github/build/src/searchd
16# Threads::Coro::ExecuteN(int, std::function<void ()>&&) in /home/snikolaev/manticore_github/build/src/searchd
17# SearchHandler_c::RunLocalSearches() in /home/snikolaev/manticore_github/build/src/searchd
18# SearchHandler_c::RunSubset(int, int) in /home/snikolaev/manticore_github/build/src/searchd
19# SearchHandler_c::RunQueries() in /home/snikolaev/manticore_github/build/src/searchd
20# HandleMysqlSelect(RowBuffer_i&, SearchHandler_c&) in /home/snikolaev/manticore_github/build/src/searchd
21# ClientSession_c::Execute(std::pair<char const*, int>, RowBuffer_i&) in /home/snikolaev/manticore_github/build/src/searchd
22# ProcessSqlQueryBuddy(std::pair<char const*, int>, unsigned char&, GenericOutputBuffer_c&) in /home/snikolaev/manticore_github/build/src/searchd
23# SqlServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >) in /home/snikolaev/manticore_github/build/src/searchd
24# MultiServe(std::unique_ptr<AsyncNetBuffer_c, std::default_delete<AsyncNetBuffer_c> >, std::pair<int, unsigned short>, Proto_e) in /home/snikolaev/manticore_github/build/src/searchd
25# 0x0000556904E99DA1 in /home/snikolaev/manticore_github/build/src/searchd
26# Threads::CoRoutine_c::CreateContext(std::function<void ()>, std::pair<boost::context::stack_context, Threads::StackFlavour_E>)::{lambda(boost::context::detail::transfer_t)#1}::_FUN(boost::context::deta
il::transfer_t) in /home/snikolaev/manticore_github/build/src/searchd
27# make_fcontext in /home/snikolaev/manticore_github/build/src/searchd

Notes

sanikolaev commented 1 year ago

select * from t where s = 'Psm' leads to a crash too.

glookka commented 1 year ago

Tested locally, seems to work fine on Manticore 6.2.13 b632bd699@230909 dev (columnar 2.2.5 709b9ac@230908) (secondary 2.2.5 709b9ac@230908)

mysql> select s from t where s = 'Psm' limit 3 /*+ SecondaryIndex(s) */;
+------+
| s    |
+------+
| pSM  |
| pSM  |
| pSM  |
+------+
3 rows in set (0.01 sec)

mysql> select s from t where s = 'Psm' limit 3 /*+ NO_SecondaryIndex(s) */;
+------+
| s    |
+------+
| pSM  |
| pSM  |
| pSM  |
+------+
3 rows in set (0.01 sec)
sanikolaev commented 1 year ago

Please try select * from t where s = 'Psm' order by id asc limit 20. Looks like the latest daemon commit indeed doesn't crash w/o explicit sorting, but does crash with sorting:

snikolaev@dev2:~$ ~/manticore_github/build/src/searchd -c configless.conf
Manticore 6.2.13 b632bd699@230909 dev (columnar 2.2.5 709b9ac@230908) (secondary 2.2.5 709b9ac@230908)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[45:58.638] [2581857] using config file '/home/snikolaev/configless.conf' (158 chars)...
starting daemon version '6.2.13 b632bd699@230909 dev (columnar 2.2.5 709b9ac@230908) (secondary 2.2.5 709b9ac@230908)' ...
listening on 127.0.0.1:9315 for mysql
listening on 127.0.0.1:9316 for sphinx and http(s)

snikolaev@dev2:~$ ~/manticore_github/test/clt-tests/mysqldump/scripts/generate-1m-records.sh |mysql -P9315 -h0

snikolaev@dev2:~$ mysql -P9315 -h0 -e "select * from t where s = 'Psm'"
+--------+----------+--------+-------------------+------+----------------------------------------------+----------------------------------+
| id     | f        | a      | b                 | s    | j                                            | m                                |
+--------+----------+--------+-------------------+------+----------------------------------------------+----------------------------------+
| 880321 | aa       | 178714 | 2036222848.000000 | pSM  | {"a":[586952982,2048365812],"b":1201184940}  | 181642610,556755463,2081214485   |
| 892684 | aaaaaaaa |  64618 | 1576034048.000000 | pSM  | {"a":[934909422,1122884788],"b":1134166442}  | 595359817,595396605,1349907623   |
| 722862 | aaaaaaaa |  90971 | 1276418560.000000 | pSM  | {"a":[713871748,1814078538],"b":742765985}   | 1216247494,1536611311,1754313343 |
| 884159 | aaaa     | 102470 | 1991553920.000000 | pSM  | {"a":[448397630,1057937464],"b":47637292}    | 65836401,362688243,875521224     |
| 804225 | aaaa     | 155649 |  115988768.000000 | pSM  | {"a":[2133032686,634652524],"b":1007710821}  | 885182175,1113344609,1113504815  |
| 905041 | aaaaaaaa |  93559 |  580148224.000000 | pSM  | {"a":[2119649212,484935528],"b":612741189}   | 1118836902,1319976713,1928092914 |
| 866847 |          | 133101 |  524718688.000000 | pSM  | {"a":[1535753770,1900559660],"b":1299141962} | 108357610,933364038,1297703609   |
| 927616 | aaaaaa   |  70502 |  627439232.000000 | pSM  | {"a":[171140676,680275296],"b":1160291849}   | 751003601,987455476,1400396427   |
| 828663 |          |  91680 |  779716800.000000 | pSM  | {"a":[1037622734,27797080],"b":1548651938}   | 371044037,620430078,953652173    |
| 888727 | aaaaaa   |  40665 | 1485843968.000000 | pSM  | {"a":[1256675386,1487621966],"b":786449612}  | 524675065,1407259421,1642721710  |
| 689952 | aa       |   4103 | 1854143744.000000 | pSM  | {"a":[1684149888,1436207512],"b":364919587}  | 313545888,473469085,829822115    |
| 810329 |          | 208505 |  521504320.000000 | pSM  | {"a":[104245744,1885549860],"b":1804522510}  | 1060113081,1222971502,1856589913 |
| 470840 | aaaa     | 130210 | 2096773248.000000 | pSM  | {"a":[1292440466,98671234],"b":785629345}    | 174206716,452410922,1779761057   |
| 423647 | aaaaaaaa | 189204 | 1828845824.000000 | pSM  | {"a":[165210076,273439048],"b":97940278}     | 1039352029,1619886975,1664418445 |
| 534528 | aaaaaaaa | 177121 |  911330432.000000 | pSM  | {"a":[1351500168,975989078],"b":1548346315}  | 18080216,246598914,1773719435    |
| 735045 | aa       |  89201 |  141662112.000000 | pSM  | {"a":[2112252914,840997832],"b":1258163752}  | 478754476,1162538008,1866331083  |
| 796414 | aa       | 110596 |  189503040.000000 | pSM  | {"a":[159376300,2041518192],"b":1147750507}  | 656002119,1812665262,1820193606  |
| 556434 | aaaaaa   | 157560 | 2057631104.000000 | pSM  | {"a":[2069763480,1090784746],"b":554921469}  | 218236850,819110476,1471148224   |
| 757706 | aaaaaa   |   5959 | 1379765376.000000 | pSM  | {"a":[915206024,1663227656],"b":1207008869}  | 316450950,648079019,829538578    |
| 489575 | aaaaaa   |  71160 | 1398938496.000000 | pSM  | {"a":[588384048,1597212616],"b":1596025406}  | 435756156,1016379240,1543976231  |
+--------+----------+--------+-------------------+------+----------------------------------------------+----------------------------------+

snikolaev@dev2:~$ mysql -P9315 -h0 -e "select * from t where s = 'Psm' order by id asc limit 20"
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
glookka commented 1 year ago

Also works fine. Will try to reproduce on dev2.

mysql> select * from t where s = 'Psm' order by id asc limit 20;
+--------+----------+--------+-------------------+------+----------------------------------------------+----------------------------------+
| id     | f        | a      | b                 | s    | j                                            | m                                |
+--------+----------+--------+-------------------+------+----------------------------------------------+----------------------------------+
|  25893 | aaaaaa   |   8854 | 2139785984.000000 | pSM  | {"a":[79605024,283554296],"b":1784577126}    | 693336920,957474493,1651343351   |
|  30623 | aa       | 110092 | 1205931264.000000 | pSM  | {"a":[1024657054,376804752],"b":546063725}   | 252842617,437018527,1478615009   |
|  44227 | aaaa     | 212210 |  367371520.000000 | pSM  | {"a":[339875658,1369264714],"b":772941482}   | 1150292829,1280458167,1707299924 |
|  56736 | aaaaaa   | 133916 | 1720635904.000000 | pSM  | {"a":[2020422890,323501996],"b":1551479959}  | 1483936901,1636169937,1876177530 |
|  66816 |          | 165163 | 1636388992.000000 | pSM  | {"a":[900633648,296327420],"b":1908224268}   | 846437376,1087403651,1266394200  |
|  70701 |          | 200700 |  288589152.000000 | pSM  | {"a":[1109218960,1973688560],"b":1871238846} | 196010896,355280928,387260721    |
|  80033 | aa       |  64293 | 2124754304.000000 | pSM  | {"a":[1962586384,851254312],"b":1374552738}  | 571083136,1103333282,2119614107  |
|  90230 | aaaaaa   | 132455 |  732874816.000000 | pSM  | {"a":[42693058,1943757256],"b":1641279119}   | 682240880,1263980762,1999073654  |
| 102536 | aaaaaa   | 128665 | 1546006272.000000 | pSM  | {"a":[644460172,1575529786],"b":459824646}   | 344490331,428266871,2117904969   |
| 128962 |          | 157882 | 1266416128.000000 | pSM  | {"a":[207811492,1576368130],"b":1138852862}  | 598881819,1234822316,2057361555  |
| 132365 | aaaaaa   | 114579 |  868929536.000000 | pSM  | {"a":[1570792800,233713336],"b":1246730467}  | 445759921,561021979,1389395548   |
| 147016 |          | 110895 |  531699744.000000 | pSM  | {"a":[839428582,714946660],"b":1461315133}   | 772408857,1320323663,1833637571  |
| 173389 | aa       |  29919 | 1188877568.000000 | pSM  | {"a":[839529592,1727928032],"b":214342300}   | 1079604615,1568698990,2111837482 |
| 193061 | aaaaaaaa |  60894 |  929179328.000000 | pSM  | {"a":[306568306,350505258],"b":1133035289}   | 334440791,1113282887,1459123966  |
| 199441 | aa       |   1636 |  176340016.000000 | pSM  | {"a":[1047383576,2110774522],"b":1584194394} | 1391552006,1628205337,1790648682 |
| 201356 | aaaa     |  50999 |  942236864.000000 | pSM  | {"a":[1723284724,1218071074],"b":2084259322} | 1046936691,1750607482,1931965117 |
| 207543 | aa       | 192294 | 1797908352.000000 | pSM  | {"a":[210662288,1462574112],"b":1225757789}  | 804907026,1235084222,1708486923  |
| 241758 | aaaaaaaa | 151205 | 1865935872.000000 | pSM  | {"a":[2141201262,1046327528],"b":872103136}  | 1328182803,1380744026,1787326445 |
| 252813 | aaaaaa   | 213612 |  880511936.000000 | pSM  | {"a":[44554528,357819786],"b":1324257075}    | 119431333,972164411,1799327497   |
| 254858 |          | 147143 | 1928699264.000000 | pSM  | {"a":[2016177480,661091820],"b":1487530053}  | 592195705,1417970300,1703085329  |
+--------+----------+--------+-------------------+------+----------------------------------------------+----------------------------------+
20 rows in set (0.05 sec)
glookka commented 1 year ago

Tested debug/release builds on dev2; unable to reproduce the crash

sanikolaev commented 1 year ago

Here's how you can reproduce this on dev2:

snikolaev@dev2:~/repro$ git clone https://github.com/manticoresoftware/manticoresearch
Cloning into 'manticoresearch'...
remote: Enumerating objects: 77701, done.
remote: Counting objects: 100% (17/17), done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 77701 (delta 6), reused 10 (delta 4), pack-reused 77684
Receiving objects: 100% (77701/77701), 47.19 MiB | 25.73 MiB/s, done.
Resolving deltas: 100% (58752/58752), done.

snikolaev@dev2:~/repro/manticoresearch$ git checkout a8fb6574e90ee54d23c0c177576b9968ed5a5d44
# I have do this since I can't build the latest commit of MCL below

snikolaev@dev2:~/repro/manticoresearch$ cmake .
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
...
-- Configuring done
-- Generating done
-- Build files have been written to: /home/snikolaev/repro/manticoresearch

snikolaev@dev2:~/repro/manticoresearch$ make -j32
[  3%] Building CXX object src/CMakeFiles/lstem.dir/sphinxstemru.cpp.o
[  3%] Building CXX object src/tokenizer/CMakeFiles/lowercaser.dir/charset_definition_parser.cpp.o
[  3%] Building CXX object src/std/CMakeFiles/manticore_std.dir/autoevent.cpp.o
...
[100%] Built target index_converter
[100%] Linking CXX executable searchd
[100%] Linking CXX executable gmanticoretest
[100%] Built target searchd
[100%] Built target gmanticoretest

snikolaev@dev2:~/repro/manticoresearch$ cd ..

snikolaev@dev2:~/repro$ git clone https://github.com/manticoresoftware/columnar
Cloning into 'columnar'...
remote: Enumerating objects: 2148, done.
remote: Counting objects: 100% (689/689), done.
remote: Compressing objects: 100% (169/169), done.
remote: Total 2148 (delta 597), reused 534 (delta 519), pack-reused 1459
Receiving objects: 100% (2148/2148), 5.90 MiB | 15.53 MiB/s, done.
Resolving deltas: 100% (1527/1527), done.

snikolaev@dev2:~/repro$ cd columnar/

snikolaev@dev2:~/repro/columnar$ git checkout 709b9acaaac97d9a1ca8796892f9ad432021c785
# I can't build the latest commit

snikolaev@dev2:~/repro/columnar$ cmake .
-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
...
-- Build files have been written to: /home/snikolaev/repro/columnar

snikolaev@dev2:~/repro/columnar$ make -j32
[  4%] Building CXX object common/CMakeFiles/common.dir/filter.cpp.o
[  4%] Building CXX object columnar/builder/CMakeFiles/builder.dir/builderbool.cpp.o
...
[100%] Built target indextool
[100%] Built target index_converter
[100%] Linking CXX executable searchd
[100%] Built target searchd

snikolaev@dev2:~/repro/columnar$ cd ..

snikolaev@dev2:~/repro$ mkdir data

snikolaev@dev2:~/repro$ cat << EOF > configless.conf
searchd {
   listen = 127.0.0.1:9315:mysql
   listen = 127.0.0.1:9316:http
   data_dir = data
   pid_file = 9315.pid
   log = searchd.log
   binlog_path =
}
EOF

snikolaev@dev2:~/repro$ LIB_MANTICORE_SECONDARY=./columnar/secondary/lib_manticore_secondary.so LIB_MANTICORE_COLUMNAR= ./manticoresearch/src/searchd -c configless.conf
Manticore 6.2.13 a8fb6574e@230913 dev (secondary 2.2.5 709b9ac@230908)
Copyright (c) 2001-2016, Andrew Aksyonoff
Copyright (c) 2008-2016, Sphinx Technologies Inc (http://sphinxsearch.com)
Copyright (c) 2017-2023, Manticore Software LTD (https://manticoresearch.com)

[03:07.751] [2883655] using config file '/home/snikolaev/repro/configless.conf' (158 chars)...
starting daemon version '6.2.13 a8fb6574e@230913 dev (secondary 2.2.5 709b9ac@230908)' ...
listening on 127.0.0.1:9315 for mysql
listening on 127.0.0.1:9316 for sphinx and http(s)

snikolaev@dev2:~/repro$ ./manticoresearch/test/clt-tests/mysqldump/scripts/generate-1m-records.sh |mysql -P9315 -h0

snikolaev@dev2:~/repro$ mysql -P9315 -h0 -e "select * from t where s = 'Psm' order by id asc limit 20"
ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
sanikolaev commented 1 year ago

I can't reproduce it anymore on dev2 with:

snikolaev@dev2:~/repro$ searchd -v
Manticore 6.2.13 822a4f23a@230914 dev (columnar 2.2.5 b49cb78@230914) (secondary 2.2.5 b49cb78@230914)
glookka commented 1 year ago

I can still reproduce in a8fb6574e / 709b9ac

I modified code to avoid such situations in 5c238c56 / 794abc43