pingcap / tiflash

The analytical engine for TiDB and TiDB Cloud. Try free: https://tidbcloud.com/free-trial
https://docs.pingcap.com/tidb/stable/tiflash-overview
Apache License 2.0
941 stars 409 forks source link

tidb received incorrect tiflash error when tiflash memory limit exceed #6422

Open AkiraXie opened 1 year ago

AkiraXie commented 1 year ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. tiflash memory limit exceed

2. What did you expect to see? (Required)

  1. tidb should receive Memory limit (total) exceed error

    3. What did you see instead (Required)

  2. tidb received error UnexpectedWriteDone img_v2_412c2d98-41f2-4bfc-8f87-d126947b13eg

4. What is your TiFlash version? (Required)

6.5

JaySon-Huang commented 1 year ago
2022-12-02 20:20:59 (UTC+08:00)
TiFlash maincluster-tiflash-2.maincluster-tiflash-peer.stable-testbed-47l4r.svc:3930
[MPPTask.cpp:427] ["task running meets error: Code: 0, e.displayText() = DB::TiFlashException: Memory limit (total) exceeded caused by 'RSS(Resident Set Size) much larger than limit' : process memory size would be 64.01 GiB for (attempt to allocate chunk of 366164 bytes), limit of memory for data computing : 63.00 GiB, e.what() = DB::TiFlashException, Stack trace:\n\n\n       0x16fa7e1\tDB::TiFlashException::TiFlashException(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, DB::TiFlashError const&) [tiflash+24094689]\n                \tdbms/src/Common/TiFlashException.h:250\n       0x16f9d73\tMemoryTracker::alloc(long, bool) [tiflash+24092019]\n                \tdbms/src/Common/MemoryTracker.cpp:152\n       0x16f9a35\tMemoryTracker::alloc(long, bool) [tiflash+24091189]\n                \tdbms/src/Common/MemoryTracker.cpp:163\n       0x16f9a35\tMemoryTracker::alloc(long, bool) [tiflash+24091189]\n                \tdbms/src/Common/MemoryTracker.cpp:163\n       0x16bc44e\tDB::HashPartitionWriter<std::__1::shared_ptr<DB::MPPTunnelSet> >::partitionAndEncodeThenWriteBlocks() [tiflash+23839822]\n                \tdbms/src/Flash/Mpp/HashPartitionWriter.cpp:94\n       0x6bff272\tDB::ExchangeSenderBlockInputStream::readImpl() [tiflash+113242738]\n                \tdbms/src/DataStreams/ExchangeSenderBlockInputStream.cpp:43\n       0x5fabab5\tDB::IProfilingBlockInputStream::read(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+100317877]\n                \tdbms/src/DataStreams/IProfilingBlockInputStream.cpp:75\n       0x5fab7a5\tDB::IProfilingBlockInputStream::read() [tiflash+100317093]\n                \tdbms/src/DataStreams/IProfilingBlockInputStream.cpp:43\n       0x6c05a1e\tDB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0, true>::Handler, (DB::StreamUnionMode)0>::work(unsigned long, DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0, true>::Handler, (DB::StreamUnionMode)0>::WorkingInputs&) [tiflash+113269278]\n                \tdbms/src/DataStreams/ParallelInputsProcessor.h:270\n       0x6c05536\tstd::__1::__function::__func<DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0, true>::Handler, (DB::StreamUnionMode)0>::process()::'lambda'(), std::__1::allocator<DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0, true>::Handler, (DB::StreamUnionMode)0>::process()::'lambda'()>, void ()>::operator()() [tiflash+113268022]\n                \t/usr/local/bin/../include/c++/v1/__functional/function.h:345\n       0x17c893b\tDB::ExecutableTask<std::__1::packaged_task<void ()> >::execute() [tiflash+24938811]\n                \tdbms/src/Common/ExecutableTask.h:52\n       0x17cbe83\tDB::DynamicThreadPool::executeTask(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >&) [tiflash+24952451]\n                \tdbms/src/Common/DynamicThreadPool.cpp:101\n       0x17cb4e0\tDB::DynamicThreadPool::fixedWork(unsigned long) [tiflash+24949984]\n                \tdbms/src/Common/DynamicThreadPool.cpp:115\n       0x17cc5d2\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::thread DB::ThreadFactory::newThread<void (DB::DynamicThreadPool::*)(unsigned long), DB::DynamicThreadPool*, unsigned long&>(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, void (DB::DynamicThreadPool::*&&)(unsigned long), DB::DynamicThreadPool*&&, unsigned long&)::'lambda'(auto&&...), DB::DynamicThreadPool*, unsigned long> >(void*) [tiflash+24954322]\n                \t/usr/local/bin/../include/c++/v1/thread:291\n  0x7fbfccd44ea5\tstart_thread [libpthread.so.0+32421]\n  0x7fbfcc14996d\t__clone [libc.so.6+1042797]"] [source=MPP<query:437776192246120470,task:10>] [thread_id=457]
2022-12-02 20:20:59 (UTC+08:00)
TiFlash maincluster-tiflash-3.maincluster-tiflash-peer.stable-testbed-47l4r.svc:3930
[MPPTask.cpp:427] ["task running meets error: Code: 0, e.displayText() = DB::Exception: write to tunnel which is already closed,tunnel9+17: unexpectedWriteDone called, e.what() = DB::Exception, Stack trace:\n\n\n       0x16e6e3e\tDB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+24014398]\n                \tdbms/src/Common/Exception.h:46\n       0x6c5b044\tDB::MPPTunnel::write(std::__1::shared_ptr<DB::TrackedMppDataPacket>&&) [tiflash+113619012]\n                \tdbms/src/Flash/Mpp/MPPTunnel.cpp:168\n       0x16bc6fc\tDB::HashPartitionWriter<std::__1::shared_ptr<DB::MPPTunnelSet> >::partitionAndEncodeThenWriteBlocks() [tiflash+23840508]\n                \tdbms/src/Flash/Mpp/HashPartitionWriter.cpp:103\n       0x6bff272\tDB::ExchangeSenderBlockInputStream::readImpl() [tiflash+113242738]\n                \tdbms/src/DataStreams/ExchangeSenderBlockInputStream.cpp:43\n       0x5fabab5\tDB::IProfilingBlockInputStream::read(DB::PODArray<unsigned char, 4096ul, Allocator<false>, 15ul, 16ul>*&, bool) [tiflash+100317877]\n                \tdbms/src/DataStreams/IProfilingBlockInputStream.cpp:75\n       0x5fab7a5\tDB::IProfilingBlockInputStream::read() [tiflash+100317093]\n                \tdbms/src/DataStreams/IProfilingBlockInputStream.cpp:43\n       0x6c05a1e\tDB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0, true>::Handler, (DB::StreamUnionMode)0>::work(unsigned long, DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0, true>::Handler, (DB::StreamUnionMode)0>::WorkingInputs&) [tiflash+113269278]\n                \tdbms/src/DataStreams/ParallelInputsProcessor.h:270\n       0x6c05536\tstd::__1::__function::__func<DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0, true>::Handler, (DB::StreamUnionMode)0>::process()::'lambda'(), std::__1::allocator<DB::ParallelInputsProcessor<DB::UnionBlockInputStream<(DB::StreamUnionMode)0, true>::Handler, (DB::StreamUnionMode)0>::process()::'lambda'()>, void ()>::operator()() [tiflash+113268022]\n                \t/usr/local/bin/../include/c++/v1/__functional/function.h:345\n       0x17c893b\tDB::ExecutableTask<std::__1::packaged_task<void ()> >::execute() [tiflash+24938811]\n                \tdbms/src/Common/ExecutableTask.h:52\n       0x17cbe83\tDB::DynamicThreadPool::executeTask(std::__1::unique_ptr<DB::IExecutableTask, std::__1::default_delete<DB::IExecutableTask> >&) [tiflash+24952451]\n                \tdbms/src/Common/DynamicThreadPool.cpp:101\n       0x17cb4e0\tDB::DynamicThreadPool::fixedWork(unsigned long) [tiflash+24949984]\n                \tdbms/src/Common/DynamicThreadPool.cpp:115\n       0x17cc5d2\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::thread DB::ThreadFactory::newThread<void (DB::DynamicThreadPool::*)(unsigned long), DB::DynamicThreadPool*, unsigned long&>(bool, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, void (DB::DynamicThreadPool::*&&)(unsigned long), DB::DynamicThreadPool*&&, unsigned long&)::'lambda'(auto&&...), DB::DynamicThreadPool*, unsigned long> >(void*) [tiflash+24954322]\n                \t/usr/local/bin/../include/c++/v1/thread:291\n  0x7fdff0274ea5\tstart_thread [libpthread.so.0+32421]\n  0x7fdfef67996d\t__clone [libc.so.6+1042797]"] [source=MPP<query:437776192246120470,task:9>] [thread_id=183]
zanmato1984 commented 1 year ago

This issue requires thorough refinement of error passing between tiflash and tidb, which requires significant effort. We'll address this in the futre.

JaySon-Huang commented 1 year ago

similar issue: https://github.com/pingcap/tidb/issues/37792