Closed hehechen closed 5 months ago
Weirdly, there is no stack frame output to the log file...
It is quite strange that there are no logs for what thread_1024 and thread_1025 is doing before thread_1024 is crashed.
I think it's better to write thread_name here. https://github.com/pingcap/tiflash/blob/a17c0bb2cd8c47e4ef48b881548dafd234f5ed42/libs/libdaemon/src/BaseDaemon.cpp#L229
I think it's better to write thread_name here.
pthread_getname_np
is not asnyc-safe so we can't use it in signal handler.
I've got a coredump file with the same error.
[2023/01/19 02:27:07.847 +00:00] [ERROR] [BaseDaemon.cpp:377] [########################################] [source=BaseDaemon] [thread_id=1006]
[2023/01/19 02:27:07.848 +00:00] [ERROR] [BaseDaemon.cpp:378] ["(from thread 1005) Received signal Segmentation fault(11)."] [source=BaseDaemon] [thread_id=1006]
[2023/01/19 02:27:07.848 +00:00] [ERROR] [BaseDaemon.cpp:408] ["Address: 0x8"] [source=BaseDaemon] [thread_id=1006]
[2023/01/19 02:27:07.848 +00:00] [ERROR] [BaseDaemon.cpp:414] ["Access: read."] [source=BaseDaemon] [thread_id=1006]
[2023/01/19 02:27:07.848 +00:00] [ERROR] [BaseDaemon.cpp:423] ["Address not mapped to object."] [source=BaseDaemon] [thread_id=1006]
Seems the crash is caused by error happens when tracing the stack by libunwind
> 5-85 /data1/jaysonhuang/qa/flash_debug
> LD_LIBRARY_PATH=. gdb ./tiflash ./core.1
GNU gdb (GDB) 8.2
...
Reading symbols from ./tiflash...done.
BFD: warning: /data1/jaysonhuang/qa/flash_debug/./core.1 is truncated: expected core file size >= 359614476288, found: 1075888128
warning: core file may not match specified executable file.
...
Core was generated by `/tiflash/tiflash server --config-file /data0/config.toml'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 access_mem (as=<optimized out>, addr=8, val=0x7fe902a408c8, write=<optimized out>, arg=<optimized out>)
at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/libunwind/src/x86_64/Ginit.c:330
330 /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/libunwind/src/x86_64/Ginit.c: No such file or directory.
[Current thread is 1 (LWP 7)]
(gdb) bt
#0 access_mem (as=<optimized out>, addr=8, val=0x7fe902a408c8, write=<optimized out>, arg=<optimized out>)
at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/libunwind/src/x86_64/Ginit.c:330
Backtrace stopped: Cannot access memory at address 0x7fe902a405f8
(gdb) info threads
Id Target Id Frame
* 1 LWP 7 access_mem (as=<optimized out>, addr=8, val=0x7fe902a408c8, write=<optimized out>, arg=<optimized out>)
at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/libunwind/src/x86_64/Ginit.c:330
2 LWP 25 0x00007fe903b40c89 in ?? ()
3 LWP 29 0x00007fe903b40c89 in ?? ()
4 LWP 28 0x00007fe908607f00 in ?? ()
5 LWP 37 0x00007fe904745de2 in ?? ()
6 LWP 22 0x00007fe903b46f43 in ?? ()
7 LWP 61 0x00007fe904745de2 in ?? ()
8 LWP 160 0x0000000006f727a0 in LZ4_decompress_generic (src=<optimized out>, dst=<optimized out>, srcSize=<optimized out>, outputSize=<optimized out>, partialDecoding=decode_full_block,
dict=noDict, lowPrefix=<optimized out>, dictStart=0x0, dictSize=0) at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/lz4/lib/lz4.c:2060
9 LWP 156 LZ4_decompress_generic (src=<optimized out>, dst=<optimized out>, srcSize=<optimized out>, outputSize=<optimized out>, partialDecoding=decode_full_block, dict=noDict,
lowPrefix=<optimized out>, dictStart=0x0, dictSize=0) at /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tics/contrib/lz4/lib/lz4.c:2000
> ./tiflash version
TiFlash
Release Version: v6.5.0
Edition: Community
Git Commit Hash: 41c08dbe20901f6cfd28ce642b39ce53f35ef48a
Git Branch: heads/refs/tags/v6.5.0
UTC Build Time: 2022-12-21 12:03:40
Enable Features: jemalloc sm4(GmSSL) avx2 avx512 unwind thinlto
Profile: RELWITHDEBINFO
Raft Proxy
Git Commit Hash: ea48821d77b57a276ce3a1363de8875c07d96756
Git Commit Branch: HEAD
UTC Build Time: 2022-12-21 12:08:30
Rust Version: rustc 1.67.0-nightly (96ddd32c4 2022-11-14)
Storage Engine: tiflash
Prometheus Prefix: tiflash_proxy_
Profile: release
Yet another error log without valid stack info from coredump file
[2023/01/19 06:55:08.127 +00:00] [ERROR] [BaseDaemon.cpp:377] [########################################] [source=BaseDaemon] [thread_id=996]
[2023/01/19 06:55:08.127 +00:00] [ERROR] [BaseDaemon.cpp:378] ["(from thread 968) Received signal Segmentation fault(11)."] [source=BaseDaemon] [thread_id=996]
[2023/01/19 06:55:08.127 +00:00] [ERROR] [BaseDaemon.cpp:408] ["Address: 0x8"] [source=BaseDaemon] [thread_id=996]
[2023/01/19 06:55:08.127 +00:00] [ERROR] [BaseDaemon.cpp:414] ["Access: read."] [source=BaseDaemon] [thread_id=996]
[2023/01/19 06:55:08.127 +00:00] [ERROR] [BaseDaemon.cpp:423] ["Address not mapped to object."] [source=BaseDaemon] [thread_id=996]
May it caused by continuous profiling?
Didn't reproduce after disabling continuous profiling.
Any updates?
closed as can not reproduced for a long time
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
Scale out 10 TiFlash nodes and run TPCH queries at the same time.
2. What did you expect to see? (Required)
Don't crash.
3. What did you see instead (Required)
Some TiFlash nodes crashed.
[2023/01/16 22:13:09.443 +00:00] [ERROR] [BaseDaemon.cpp:377] [########################################] [source=BaseDaemon] [thread_id=1025] [2023/01/16 22:13:09.443 +00:00] [ERROR] [BaseDaemon.cpp:378] ["(from thread 1024) Received signal Segmentation fault(11)."] [source=BaseDaemon] [thread_id=1025] [2023/01/16 22:13:09.443 +00:00] [ERROR] [BaseDaemon.cpp:408] ["Address: 0x8"] [source=BaseDaemon] [thread_id=1025] [2023/01/16 22:13:09.443 +00:00] [ERROR] [BaseDaemon.cpp:414] ["Access: read."] [source=BaseDaemon] [thread_id=1025] [2023/01/16 22:13:09.443 +00:00] [ERROR] [BaseDaemon.cpp:423] ["Address not mapped to object."] [source=BaseDaemon] [thread_id=1025]
4. What is your TiFlash version? (Required)
v6.5.0 41c08dbe20901f6cfd28ce642b39ce53f35ef48a