ydb-platform / ydb

YDB is an open source Distributed SQL Database that combines high availability and scalability with strong consistency and ACID transactions
https://ydb.tech
Apache License 2.0
4.02k stars 587 forks source link

All databases nodes crashes while executing even the most simple INSERT INTO SELECT FROM query #11186

Open SloNN opened 3 weeks ago

SloNN commented 3 weeks ago

Issueing the most trivial query

$v = SELECT cast(count(*) as uint32)
    FROM `raw/kikimr_query-replay_prod`
    ;

insert into cnt2 (key,c) values(1,$v);

All database nodes crashes

telegram-cloud-photo-size-2-5269278991171315021-y

nikvas0 commented 3 weeks ago

Reproduced this on another cluster.

Nov  2 17:23:50 ydb-vla-testing-0000 kikimr_31003[424327]: VERIFY failed (2024-11-02T17:23:50.083020+0300): tablet_id=44;verification=Stage == from;fline=actor.h:63;from=1;real=0;to=2;
Nov  2 17:23:50 ydb-vla-testing-0000 kikimr_31003[424327]:   ydb/library/actors/core/log.cpp:748
Nov  2 17:23:50 ydb-vla-testing-0000 kikimr_31003[424327]:   ~TVerifyFormattedRecordWriter(): requirement false failed
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 0. /home/ns-vasilev/ydbwork/ydb/util/system/yassert.cpp:83: NPrivate::InternalPanicImpl(int, char const*, char const*, int, int, int, TBasicStringBuf<char, std::__y1::char_traits<char>>, char const*, unsigned long) @ 0xAFF387D
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 1. /home/ns-vasilev/ydbwork/ydb/util/system/yassert.cpp:55: NPrivate::Panic(NPrivate::TStaticBuf const&, int, char const*, char const*, char const*, ...) @ 0xAFED89C
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 2. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/log.cpp:748: NActors::TVerifyFormattedRecordWriter::~TVerifyFormattedRecordWriter() @ 0xBFDFC33
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 3. /home/ns-vasilev/ydbwork/ydb/ydb/core/tx/columnshard/data_reader/actor.h:63: NKikimr::NOlap::NDataReader::TActor::SwitchStage(NKikimr::NOlap::NDataReader::TActor::EStage, NKikimr::NOlap::NDataReader::TActor::EStage) @ 0x140A01DD
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 4. /home/ns-vasilev/ydbwork/ydb/ydb/core/tx/columnshard/data_reader/actor.cpp:38: NKikimr::NOlap::NDataReader::TActor::HandleExecute(TAutoPtr<NActors::TEventHandle<NKikimr::NKqp::TEvKqpCompute::TEvScanError>, TDelete>&) @ 0x140A0753
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 5. /home/ns-vasilev/ydbwork/ydb/ydb/core/tx/columnshard/data_reader/actor.h:85: NKikimr::NOlap::NDataReader::TActor::StateFunc(TAutoPtr<NActors::IEventHandle, TDelete>&) @ 0x140A0DB2
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 6. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:251: NActors::TGenericExecutorThread::TProcessingResult NActors::TGenericExecutorThread::Execute<NActors::TMailboxTable::THTSwapMailbox>(NActors::TMailboxTable::THTSwapMailbox*, unsigned int, bool) @ 0xBFC10F7
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 7. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:440: NActors::TGenericExecutorThread::ProcessExecutorPool(NActors::IExecutorPool*)::$_0::operator()(unsigned int, bool) const @ 0xBFB8621
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 8. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:493: NActors::TGenericExecutorThread::ProcessExecutorPool(NActors::IExecutorPool*) @ 0xBFB8064
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 9. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:524: NActors::TExecutorThread::ThreadProc() @ 0xBFB8E5F
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 10. /home/ns-vasilev/ydbwork/ydb/util/system/thread.cpp:244: (anonymous namespace)::TPosixThread::ThreadProxy(void*) @ 0xAFF7BFE
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 11. ??:0: ?? @ 0x7F7DC4B91608
Nov  2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 12. ??:0: ?? @ 0x7F7DC4AB1352
vlad-gogov commented 2 weeks ago

Logs from OLAP TESTING VLA COMMON3: https://paste.yandex-team.ru/332eb072-8a16-4af2-9c6a-379bdaeee6c1 Version: ydb-stable-24-3-12

nikvas0 commented 2 weeks ago

Got another verify on main https://paste.yandex-team.ru/021eb924-2572-47b0-be72-3de4a5b0941e

vlad-gogov commented 2 weeks ago

After applying the changes from https://github.com/ydb-platform/ydb/pull/11289 and locally reproducing the issue, I received the following message: https://paste.yandex-team.ru/73c8aacb-8373-4686-8978-0219fe149401 image

nikvas0 commented 2 weeks ago

Looks like snapshot is really older.

Snapshot too old: {1730969344010:844424930162099}. CS min read snapshot: {1730975557090:max}. now: 2024-11-07T10:37:37.171747Z,
❯ date -uR -r 1730969344
Thu, 07 Nov 2024 08:49:04 +0000
❯ date -uR -r 1730975557
Thu, 07 Nov 2024 10:32:37 +0000
vlad-gogov commented 1 week ago

local reproducing in unit test: https://github.com/ydb-platform/ydb/pull/11468

SloNN commented 1 hour ago

Make the same query once again on new version and all database node crashed

$cnt = select cast(count(*) as int64) from `raw/kikimr_ydb_kikimr-log`; insert into cnt3(key,c) values(5,$cnt)