Open SloNN opened 3 weeks ago
Reproduced this on another cluster.
Nov 2 17:23:50 ydb-vla-testing-0000 kikimr_31003[424327]: VERIFY failed (2024-11-02T17:23:50.083020+0300): tablet_id=44;verification=Stage == from;fline=actor.h:63;from=1;real=0;to=2;
Nov 2 17:23:50 ydb-vla-testing-0000 kikimr_31003[424327]: ydb/library/actors/core/log.cpp:748
Nov 2 17:23:50 ydb-vla-testing-0000 kikimr_31003[424327]: ~TVerifyFormattedRecordWriter(): requirement false failed
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 0. /home/ns-vasilev/ydbwork/ydb/util/system/yassert.cpp:83: NPrivate::InternalPanicImpl(int, char const*, char const*, int, int, int, TBasicStringBuf<char, std::__y1::char_traits<char>>, char const*, unsigned long) @ 0xAFF387D
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 1. /home/ns-vasilev/ydbwork/ydb/util/system/yassert.cpp:55: NPrivate::Panic(NPrivate::TStaticBuf const&, int, char const*, char const*, char const*, ...) @ 0xAFED89C
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 2. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/log.cpp:748: NActors::TVerifyFormattedRecordWriter::~TVerifyFormattedRecordWriter() @ 0xBFDFC33
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 3. /home/ns-vasilev/ydbwork/ydb/ydb/core/tx/columnshard/data_reader/actor.h:63: NKikimr::NOlap::NDataReader::TActor::SwitchStage(NKikimr::NOlap::NDataReader::TActor::EStage, NKikimr::NOlap::NDataReader::TActor::EStage) @ 0x140A01DD
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 4. /home/ns-vasilev/ydbwork/ydb/ydb/core/tx/columnshard/data_reader/actor.cpp:38: NKikimr::NOlap::NDataReader::TActor::HandleExecute(TAutoPtr<NActors::TEventHandle<NKikimr::NKqp::TEvKqpCompute::TEvScanError>, TDelete>&) @ 0x140A0753
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 5. /home/ns-vasilev/ydbwork/ydb/ydb/core/tx/columnshard/data_reader/actor.h:85: NKikimr::NOlap::NDataReader::TActor::StateFunc(TAutoPtr<NActors::IEventHandle, TDelete>&) @ 0x140A0DB2
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 6. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:251: NActors::TGenericExecutorThread::TProcessingResult NActors::TGenericExecutorThread::Execute<NActors::TMailboxTable::THTSwapMailbox>(NActors::TMailboxTable::THTSwapMailbox*, unsigned int, bool) @ 0xBFC10F7
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 7. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:440: NActors::TGenericExecutorThread::ProcessExecutorPool(NActors::IExecutorPool*)::$_0::operator()(unsigned int, bool) const @ 0xBFB8621
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 8. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:493: NActors::TGenericExecutorThread::ProcessExecutorPool(NActors::IExecutorPool*) @ 0xBFB8064
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 9. /home/ns-vasilev/ydbwork/ydb/ydb/library/actors/core/executor_thread.cpp:524: NActors::TExecutorThread::ThreadProc() @ 0xBFB8E5F
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 10. /home/ns-vasilev/ydbwork/ydb/util/system/thread.cpp:244: (anonymous namespace)::TPosixThread::ThreadProxy(void*) @ 0xAFF7BFE
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 11. ??:0: ?? @ 0x7F7DC4B91608
Nov 2 17:23:51 ydb-vla-testing-0000 kikimr_31003[424327]: 12. ??:0: ?? @ 0x7F7DC4AB1352
Logs from OLAP TESTING VLA COMMON3: https://paste.yandex-team.ru/332eb072-8a16-4af2-9c6a-379bdaeee6c1 Version: ydb-stable-24-3-12
Got another verify on main https://paste.yandex-team.ru/021eb924-2572-47b0-be72-3de4a5b0941e
After applying the changes from https://github.com/ydb-platform/ydb/pull/11289 and locally reproducing the issue, I received the following message: https://paste.yandex-team.ru/73c8aacb-8373-4686-8978-0219fe149401
Looks like snapshot is really older.
Snapshot too old: {1730969344010:844424930162099}. CS min read snapshot: {1730975557090:max}. now: 2024-11-07T10:37:37.171747Z,
❯ date -uR -r 1730969344
Thu, 07 Nov 2024 08:49:04 +0000
❯ date -uR -r 1730975557
Thu, 07 Nov 2024 10:32:37 +0000
local reproducing in unit test: https://github.com/ydb-platform/ydb/pull/11468
Make the same query once again on new version and all database node crashed
$cnt = select cast(count(*) as int64) from `raw/kikimr_ydb_kikimr-log`; insert into cnt3(key,c) values(5,$cnt)
Issueing the most trivial query
All database nodes crashes