llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.24k stars 12.07k forks source link

Clang++ C++ optimization crashes ClickHouse on PowerPC64le platform #102311

Closed HarryLeeIBM closed 2 months ago

HarryLeeIBM commented 3 months ago

I used Ubuntu clang version 18.1.4 to build ClickHouse(v24.7.x) using cross-compiling for PowerPC64le platform and found ClickHouse crashes. When I build ClickHouse with -O0 option it doesn't crash, so it could be a wrong optimization issue.

Tried to turn off optimization of file src/Processors/Executors/PipelineExecutor.cpp by adding #pragma clang optimize off after #include lines and adding #pragma clang optimize on at the end of the file, issue disappears.

To reproduce the issue, use the following steps:

  1. git clone ClickHouse and install necessary tools (see details: https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/development/build.md).
  2. Use the following commands to do cross-compiling under ClickHouse folder:
     mkdir build
     cd build
     cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_OPTIM=ON -DGLIBC_COMPATIBILITY=OFF -DENABLE_RDKAFKA=ON -DBUILD_SHARED_LIBS=OFF -DITK_DYNAMIC_LOADING=OFF -DENABLE_AMQPCPP=OFF -DENABLE_AVRO=ON -DENABLE_CAPNP=OFF -DENABLE_CASSANDRA=OFF -DENABLE_H3=OFF -DENABLE_HDFS=OFF -DENABLE_MSGPACK=OFF -DENABLE_MYSQL=OFF -DENABLE_NLP=OFF -DENABLE_ODBC=OFF -DENABLE_ORC=OFF -DENABLE_PARQUET=OFF -DENABLE_ROCKSDB=OFF -DENABLE_SQLITE=OFF -DENABLE_SHELL_COMMANDS=0 -DENABLE_PLAY=0 -DADD_GDB_INDEX_FOR_GOLD=1 -DCLICKHOUSE_OFFICIAL_BUILD=1 -DCMAKE_TOOLCHAIN_FILE=cmake/linux/toolchain-ppc64le.cmake ..
  1. Copy clickhouse executable under build/progams to a power pc 64le machine.
  2. Run "clickhouse server" to start the ClickHouse server
  3. use another terminal to run "clickhouse client" to run ClickHouse client. Under the prompt, run SQL:

select if(in(dummy, tuple(0, 1)), 'ok', 'ok') from remote('localhost', system.one) settings legacy_column_name_of_tuple_literal=1, prefer_localhost_replica=0;

Then you will notice the server crashes and core dump is created. By analyzing the core dump, the stack trace is as following:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  std::__1::construct_at[abi:v15000]<DB::ExecutingGraph::Node*, DB::ExecutingGraph::Node*, DB::ExecutingGraph::Node**>(DB::ExecutingGraph::Node**, DB::ExecutingGraph::Node*&&) (__location=0x7f43d378418304bc, __args=<optimized out>)
    at ./contrib/llvm-project/libcxx/include/__memory/construct_at.h:35

warning: 35 ./contrib/llvm-project/libcxx/include/__memory/construct_at.h: No such file or directory
[Current thread is 1 (Thread 0x75a79cf69110 (LWP 262772))]
(gdb) bt
#0  std::__1::construct_at[abi:v15000]<DB::ExecutingGraph::Node*, DB::ExecutingGraph::Node*, DB::ExecutingGraph::Node**>(DB::ExecutingGraph::Node**, DB::ExecutingGraph::Node*&&) (__location=0x7f43d378418304bc, __args=<optimized out>)
    at ./contrib/llvm-project/libcxx/include/__memory/construct_at.h:35
#1  std::__1::allocator_traits<AllocatorWithMemoryTracking<DB::ExecutingGraph::Node*> >::construct[abi:v15000]<DB::ExecutingGraph::Node*, DB::ExecutingGraph::Node*, void, void>(AllocatorWithMemoryTracking<DB::ExecutingGraph::Node*>&, DB::ExecutingGraph::Node**, DB::ExecutingGraph::Node*&&) (__p=0x7f43d378418304bc, 
    __args=<optimized out>)
    at ./contrib/llvm-project/libcxx/include/__memory/allocator_traits.h:298
#2  std::__1::deque<DB::ExecutingGraph::Node*, AllocatorWithMemoryTracking<DB::ExecutingGraph::Node*> >::push_back (this=0x75a79cf67ee0, __v=<optimized out>)
    at ./contrib/llvm-project/libcxx/include/deque:1967
#3  std::__1::queue<DB::ExecutingGraph::Node*, std::__1::deque<DB::ExecutingGraph::Node*, AllocatorWithMemoryTracking<DB::ExecutingGraph::Node*> > >::push[abi:v15000](DB::ExecutingGraph::Node*&&) (this=0x75a79cf67ee0, __v=<optimized out>)
    at ./contrib/llvm-project/libcxx/include/queue:365
#4  DB::ExecutingGraph::updateNode (this=0x75a795831900, pid=0, queue=..., 
    async_queue=...)
    at ./ppc18-rel/./src/Processors/Executors/ExecutingGraph.cpp:344
#5  0x0000000022a67c74 in DB::PipelineExecutor::executeStepImpl (
    this=0x75a7958b2a18, thread_num=<optimized out>, yield_flag=0x0)
    at ./ppc18-rel/./src/Processors/Executors/PipelineExecutor.cpp:291
--Type <RET> for more, q to quit, c to continue without paging--c
#6  0x0000000022a6728c in DB::PipelineExecutor::executeSingleThread (
    this=0x75a7958b2a18, thread_num=0)
    at ./ppc18-rel/./src/Processors/Executors/PipelineExecutor.cpp:238
#7  DB::PipelineExecutor::executeImpl (this=0x75a7958b2a18, 
    num_threads=<optimized out>, concurrency_control=<optimized out>)
    at ./ppc18-rel/./src/Processors/Executors/PipelineExecutor.cpp:410
#8  0x0000000022a66f74 in DB::PipelineExecutor::execute (this=0x75a7958b2a18, 
    num_threads=1, concurrency_control=<optimized out>)
    at ./ppc18-rel/./src/Processors/Executors/PipelineExecutor.cpp:110
#9  0x0000000022a77a64 in DB::threadFunction (data=..., thread_group=..., 
    num_threads=1, 
    concurrency_control=<error reading variable: Unable to access DWARF register number 73>)
    at ./ppc18-rel/./src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:83
#10 DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0::operator()() const (this=<optimized out>)
    at ./ppc18-rel/./src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:109
#11 std::__1::__invoke[abi:v15000]<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&) (__f=...)
    at ./contrib/llvm-project/libcxx/include/__functional/invoke.h:394
#12 _ZNSt3__118__apply_tuple_implB6v15000IRZN2DB28PullingAsyncPipelineExecutor4pullERNS1_5ChunkEmE3$_0RNS_5tupleIJEEETpTnmJEEEDcOT_OT0_NS_15__tuple_indicesIJXspT1_EEEE (__f=..., __t=...) at ./contrib/llvm-project/libcxx/include/tuple:1789
#13 std::__1::apply[abi:v15000]<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&, std::__1::tuple<>&>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&, std::__1::tuple<>&) (__f=..., __t=...)
    at ./contrib/llvm-project/libcxx/include/tuple:1798
#14 ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}::operator()() (this=<optimized out>) at ./src/Common/ThreadPool.h:251
#15 std::__1::__invoke[abi:v15000]<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}&>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&) (__f=...)
    at ./contrib/llvm-project/libcxx/include/__functional/invoke.h:394
#16 std::__1::__invoke_void_return_wrapper<void, true>::__call<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}&>(ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}&) (__args=...)
    at ./contrib/llvm-project/libcxx/include/__functional/invoke.h:479
#17 std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}, void ()>::operator()[abi:v15000]() (
    this=<optimized out>)
    at ./contrib/llvm-project/libcxx/include/__functional/function.h:235
#18 std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}, void ()> >(std::__1::__function::__policy_storage const*) (
    __buf=<optimized out>)
    at ./contrib/llvm-project/libcxx/include/__functional/function.h:716
#19 0x000000001ad690cc in std::__1::__function::__policy_func<void ()>::operator()[abi:v15000]() const (this=0x75a79cf68570)
    at ./contrib/llvm-project/libcxx/include/__functional/function.h:848
#20 std::__1::function<void()>::operator() (this=0x75a79cf68570)
    at ./contrib/llvm-project/libcxx/include/__functional/function.h:1187
#21 ThreadPoolImpl<std::__1::thread>::worker (this=0x75a97c042e40, 
    thread_it=...) at ./ppc18-rel/./src/Common/ThreadPool.cpp:462
#22 0x000000001ad6e274 in ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::{lambda()#2}::operator()() const (this=0x75a7ab9f27c8)
    at ./ppc18-rel/./src/Common/ThreadPool.cpp:219
#23 std::__1::__invoke[abi:v15000]<ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::{lambda()#2}>(ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::{lambda()#2}&&) (__f=...)
    at ./contrib/llvm-project/libcxx/include/__functional/invoke.h:394
#24 _ZNSt3__116__thread_executeB6v15000INS_10unique_ptrINS_15__thread_structENS_14default_deleteIS2_EEEEZN14ThreadPoolImplINS_6threadEE12scheduleImplIvEET_NS_8functionIFvvEEE8PriorityNS_8optionalImEEbEUlvE0_JETpTnmJEEEvRNS_5tupleIJSA_T0_DpT1_EEENS_15__tuple_indicesIJXspT2_EEEE (__t=...)
    at ./contrib/llvm-project/libcxx/include/thread:284
#25 std::__1::__thread_proxy[abi:v15000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::{lambda()#2}> >(void*) (
    __vp=0x75a7ab9f27c0) at ./contrib/llvm-project/libcxx/include/thread:295
#26 0x000075a97cdf8838 in start_thread ()
   from /lib/powerpc64le-linux-gnu/libpthread.so.0
#27 0x000075a97ccfba44 in clone () from /lib/powerpc64le-linux-gnu/libc.so.6
EugeneZelenko commented 3 months ago

Could you please try 19 or main branch?

HarryLeeIBM commented 3 months ago

@EugeneZelenko, tried clang-19, after fixing some compiling errors, it can compile but the issue still happens. Also tried clang-18, no luck either.

HarryLeeIBM commented 3 months ago

Updates: Tried to turn off optimization of file src/Processors/Executors/PipelineExecutor.cpp by adding #pragma clang optimize off after #include lines and adding #pragma clang optimize on at the end of the file, issue disappears. So the optimization issue happens in this file.

llvmbot commented 3 months ago

@llvm/issue-subscribers-backend-powerpc

Author: Harry Lee (HarryLeeIBM)

I used Ubuntu clang version 18.1.4 to build ClickHouse(v24.7.x) using cross-compiling for PowerPC64le platform and found ClickHouse crashes. When I build ClickHouse with -O0 option it doesn't crash, so it could be a wrong optimization issue. Tried to turn off optimization of file `src/Processors/Executors/PipelineExecutor.cpp` by adding `#pragma clang optimize off` after `#include` lines and adding `#pragma clang optimize on` at the end of the file, issue disappears. To reproduce the issue, use the following steps: 1. git clone ClickHouse and install necessary tools (see details: https://github.com/ClickHouse/ClickHouse/blob/master/docs/en/development/build.md). 2. Use the following commands to do cross-compiling under ClickHouse folder: ``` mkdir build cd build cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_OPTIM=ON -DGLIBC_COMPATIBILITY=OFF -DENABLE_RDKAFKA=ON -DBUILD_SHARED_LIBS=OFF -DITK_DYNAMIC_LOADING=OFF -DENABLE_AMQPCPP=OFF -DENABLE_AVRO=ON -DENABLE_CAPNP=OFF -DENABLE_CASSANDRA=OFF -DENABLE_H3=OFF -DENABLE_HDFS=OFF -DENABLE_MSGPACK=OFF -DENABLE_MYSQL=OFF -DENABLE_NLP=OFF -DENABLE_ODBC=OFF -DENABLE_ORC=OFF -DENABLE_PARQUET=OFF -DENABLE_ROCKSDB=OFF -DENABLE_SQLITE=OFF -DENABLE_SHELL_COMMANDS=0 -DENABLE_PLAY=0 -DADD_GDB_INDEX_FOR_GOLD=1 -DCLICKHOUSE_OFFICIAL_BUILD=1 -DCMAKE_TOOLCHAIN_FILE=cmake/linux/toolchain-ppc64le.cmake .. ``` 3. Copy clickhouse executable under build/progams to a power pc 64le machine. 4. Run "clickhouse server" to start the ClickHouse server 5. use another terminal to run "clickhouse client" to run ClickHouse client. Under the prompt, run SQL: `select if(in(dummy, tuple(0, 1)), 'ok', 'ok') from remote('localhost', system.one) settings legacy_column_name_of_tuple_literal=1, prefer_localhost_replica=0;` Then you will notice the server crashes and core dump is created. By analyzing the core dump, the stack trace is as following: ``` Program terminated with signal SIGSEGV, Segmentation fault. #0 std::__1::construct_at[abi:v15000]<DB::ExecutingGraph::Node*, DB::ExecutingGraph::Node*, DB::ExecutingGraph::Node**>(DB::ExecutingGraph::Node**, DB::ExecutingGraph::Node*&&) (__location=0x7f43d378418304bc, __args=<optimized out>) at ./contrib/llvm-project/libcxx/include/__memory/construct_at.h:35 warning: 35 ./contrib/llvm-project/libcxx/include/__memory/construct_at.h: No such file or directory [Current thread is 1 (Thread 0x75a79cf69110 (LWP 262772))] (gdb) bt #0 std::__1::construct_at[abi:v15000]<DB::ExecutingGraph::Node*, DB::ExecutingGraph::Node*, DB::ExecutingGraph::Node**>(DB::ExecutingGraph::Node**, DB::ExecutingGraph::Node*&&) (__location=0x7f43d378418304bc, __args=<optimized out>) at ./contrib/llvm-project/libcxx/include/__memory/construct_at.h:35 #1 std::__1::allocator_traits<AllocatorWithMemoryTracking<DB::ExecutingGraph::Node*> >::construct[abi:v15000]<DB::ExecutingGraph::Node*, DB::ExecutingGraph::Node*, void, void>(AllocatorWithMemoryTracking<DB::ExecutingGraph::Node*>&, DB::ExecutingGraph::Node**, DB::ExecutingGraph::Node*&&) (__p=0x7f43d378418304bc, __args=<optimized out>) at ./contrib/llvm-project/libcxx/include/__memory/allocator_traits.h:298 #2 std::__1::deque<DB::ExecutingGraph::Node*, AllocatorWithMemoryTracking<DB::ExecutingGraph::Node*> >::push_back (this=0x75a79cf67ee0, __v=<optimized out>) at ./contrib/llvm-project/libcxx/include/deque:1967 #3 std::__1::queue<DB::ExecutingGraph::Node*, std::__1::deque<DB::ExecutingGraph::Node*, AllocatorWithMemoryTracking<DB::ExecutingGraph::Node*> > >::push[abi:v15000](DB::ExecutingGraph::Node*&&) (this=0x75a79cf67ee0, __v=<optimized out>) at ./contrib/llvm-project/libcxx/include/queue:365 #4 DB::ExecutingGraph::updateNode (this=0x75a795831900, pid=0, queue=..., async_queue=...) at ./ppc18-rel/./src/Processors/Executors/ExecutingGraph.cpp:344 #5 0x0000000022a67c74 in DB::PipelineExecutor::executeStepImpl ( this=0x75a7958b2a18, thread_num=<optimized out>, yield_flag=0x0) at ./ppc18-rel/./src/Processors/Executors/PipelineExecutor.cpp:291 --Type <RET> for more, q to quit, c to continue without paging--c #6 0x0000000022a6728c in DB::PipelineExecutor::executeSingleThread ( this=0x75a7958b2a18, thread_num=0) at ./ppc18-rel/./src/Processors/Executors/PipelineExecutor.cpp:238 #7 DB::PipelineExecutor::executeImpl (this=0x75a7958b2a18, num_threads=<optimized out>, concurrency_control=<optimized out>) at ./ppc18-rel/./src/Processors/Executors/PipelineExecutor.cpp:410 #8 0x0000000022a66f74 in DB::PipelineExecutor::execute (this=0x75a7958b2a18, num_threads=1, concurrency_control=<optimized out>) at ./ppc18-rel/./src/Processors/Executors/PipelineExecutor.cpp:110 #9 0x0000000022a77a64 in DB::threadFunction (data=..., thread_group=..., num_threads=1, concurrency_control=<error reading variable: Unable to access DWARF register number 73>) at ./ppc18-rel/./src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:83 #10 DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0::operator()() const (this=<optimized out>) at ./ppc18-rel/./src/Processors/Executors/PullingAsyncPipelineExecutor.cpp:109 #11 std::__1::__invoke[abi:v15000]<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&) (__f=...) at ./contrib/llvm-project/libcxx/include/__functional/invoke.h:394 #12 _ZNSt3__118__apply_tuple_implB6v15000IRZN2DB28PullingAsyncPipelineExecutor4pullERNS1_5ChunkEmE3$_0RNS_5tupleIJEEETpTnmJEEEDcOT_OT0_NS_15__tuple_indicesIJXspT1_EEEE (__f=..., __t=...) at ./contrib/llvm-project/libcxx/include/tuple:1789 #13 std::__1::apply[abi:v15000]<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&, std::__1::tuple<>&>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&, std::__1::tuple<>&) (__f=..., __t=...) at ./contrib/llvm-project/libcxx/include/tuple:1798 #14 ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}::operator()() (this=<optimized out>) at ./src/Common/ThreadPool.h:251 #15 std::__1::__invoke[abi:v15000]<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}&>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&) (__f=...) at ./contrib/llvm-project/libcxx/include/__functional/invoke.h:394 #16 std::__1::__invoke_void_return_wrapper<void, true>::__call<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}&>(ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}&) (__args=...) at ./contrib/llvm-project/libcxx/include/__functional/invoke.h:479 #17 std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}, void ()>::operator()[abi:v15000]() ( this=<optimized out>) at ./contrib/llvm-project/libcxx/include/__functional/function.h:235 #18 std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::{lambda()#1}, void ()> >(std::__1::__function::__policy_storage const*) ( __buf=<optimized out>) at ./contrib/llvm-project/libcxx/include/__functional/function.h:716 #19 0x000000001ad690cc in std::__1::__function::__policy_func<void ()>::operator()[abi:v15000]() const (this=0x75a79cf68570) at ./contrib/llvm-project/libcxx/include/__functional/function.h:848 #20 std::__1::function<void()>::operator() (this=0x75a79cf68570) at ./contrib/llvm-project/libcxx/include/__functional/function.h:1187 #21 ThreadPoolImpl<std::__1::thread>::worker (this=0x75a97c042e40, thread_it=...) at ./ppc18-rel/./src/Common/ThreadPool.cpp:462 #22 0x000000001ad6e274 in ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::{lambda()#2}::operator()() const (this=0x75a7ab9f27c8) at ./ppc18-rel/./src/Common/ThreadPool.cpp:219 #23 std::__1::__invoke[abi:v15000]<ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::{lambda()#2}>(ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::{lambda()#2}&&) (__f=...) at ./contrib/llvm-project/libcxx/include/__functional/invoke.h:394 #24 _ZNSt3__116__thread_executeB6v15000INS_10unique_ptrINS_15__thread_structENS_14default_deleteIS2_EEEEZN14ThreadPoolImplINS_6threadEE12scheduleImplIvEET_NS_8functionIFvvEEE8PriorityNS_8optionalImEEbEUlvE0_JETpTnmJEEEvRNS_5tupleIJSA_T0_DpT1_EEENS_15__tuple_indicesIJXspT2_EEEE (__t=...) at ./contrib/llvm-project/libcxx/include/thread:284 #25 std::__1::__thread_proxy[abi:v15000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, Priority, std::__1::optional<unsigned long>, bool)::{lambda()#2}> >(void*) ( __vp=0x75a7ab9f27c0) at ./contrib/llvm-project/libcxx/include/thread:295 #26 0x000075a97cdf8838 in start_thread () from /lib/powerpc64le-linux-gnu/libpthread.so.0 #27 0x000075a97ccfba44 in clone () from /lib/powerpc64le-linux-gnu/libc.so.6 ```
EsmeYi commented 3 months ago

Hi Harry, I'm having a look into this issue. I built ClickHouse with Clang18 on PPC64LE, however I didn't see the segmentation fault as expected. Instead, I got the following NETWORK_ERROR. What am I missing? I don't know much about ClickHouse, could you please help me point out my problem? Thanks!

$ ./clickhouse client
ClickHouse client version 24.8.1.1 (official build).
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 24.8.1.

Warnings:
 * Linux is not using a fast clock source. Performance can be degraded. Check /sys/devices/system/clocksource/clocksource0/current_clocksource

ilum.aus.stglabs.ibm.com :) 
ilum.aus.stglabs.ibm.com :) select if(in(dummy, tuple(0, 1)), 'ok', 'ok') from remote('localhost', system.one) settings legacy_column_name_of_tuple_literal=1, prefer_localhost_replica=0;

SELECT if(dummy IN (0, 1), 'ok', 'ok')
FROM remote('localhost', system.one)
SETTINGS legacy_column_name_of_tuple_literal = 1, prefer_localhost_replica = 0

Query id: 215ce427-7c26-42cb-a202-84946251ecb9

Error on processing query: Code: 32. DB::Exception: Attempt to read after eof: while receiving packet from localhost:9000. (ATTEMPT_TO_READ_AFTER_EOF) (version 24.8.1.1 (official build))

Connecting to localhost:9000 as user default.
Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)
EsmeYi commented 3 months ago

Oh well, ignore my previous comment. I just noticed the segmentation fault on server part.

chenzheng1030 commented 3 months ago

Confirmed with:

The case is good.

Will check further...

chenzheng1030 commented 3 months ago

The error is caused by a wrong machineLICM for an xxlxor instruction.

chenzheng1030 commented 3 months ago

MachineLICM is not wrong. It is just a trigger for the error.

.LBB35_9:                               # %if.end
                                        #   in Loop: Header=BB35_8 Depth=2
    mr  3, 27
    # xxlxor 63, 63, 63 #bad
    bl _ZN2DB22ExecutionThreadContext11executeTaskEv
    nop
    #xxlxor 63, 63, 63 #good
    andi. 3, 3, 1
    bc 12, 1, .LBB35_14
    b .LBB35_10

xxlxor 63, 63, 63 is the instruction machineLICM hoists. It is hoisted to the entry block of the function _ZN2DB16PipelineExecutor15executeStepImplEmPNSt3__16atomicIbEE.

I narrow it down to the place before/after bl _ZN2DB22ExecutionThreadContext11executeTaskEv inside _ZN2DB16PipelineExecutor15executeStepImplEmPNSt3__16atomicIbEE

If xxlxor 63, 63, 63 is put before bl _ZN2DB22ExecutionThreadContext11executeTaskEv, the binary crashes. Confirmed that after _ZN2DB22ExecutionThreadContext11executeTaskEv, vs63 is changed which is wrong as vs63 is a callee save register. I further narrowed down to some function, but have not find out which instruction changes the vs63 and not restored in the function's epilogue.

Need more time to investigation.

@HarryLeeIBM I may leave for one/two days for some downstream works, will back to this after that.

chenzheng1030 commented 3 months ago

@HarryLeeIBM Hi, this turns out to be source code issue, we need to handle vector CSR registers(and maybe float point CSR registers) in ClickHouse/contrib/boost/libs/context/src/asm for PPC target, otherwise, the vector CSR registers will not be restored after calling to the assembly functions in this directory, like jump_fcontext and ontop_fcontext.

Below hack(only handle vs63 which is allocated by compiler, full solution is to handle all vector CSR registers) can make the case pass:

$ pwd
/ClickHouse/contrib/boost
$ git diff
diff --git a/libs/context/src/asm/jump_ppc64_sysv_elf_gas.S b/libs/context/src/asm/jump_ppc64_sysv_elf_gas.S
index 28907db32..f3b7a230b 100644
--- a/libs/context/src/asm/jump_ppc64_sysv_elf_gas.S
+++ b/libs/context/src/asm/jump_ppc64_sysv_elf_gas.S
@@ -97,7 +97,7 @@ jump_fcontext:
 # endif
 #endif
     # reserve space on stack
-    subi  %r1, %r1, 184
+    subi  %r1, %r1, 200

 #if _CALL_ELF != 2
     std  %r2,  0(%r1)  # save TOC
@@ -133,6 +133,10 @@ jump_fcontext:
     # save LR as PC
     std   %r0, 176(%r1)

+    # save VS63
+    li %r31, 184
+    stvx %v31, %r1, %r31
+
     # store RSP (pointing to context-data) in R6
     mr  %r6, %r1

@@ -145,6 +149,11 @@ jump_fcontext:

     ld  %r2,  0(%r1)  # restore TOC
 #endif
+
+    # restore VS63
+    li %r31, 184
+    lvx %v31, %r1, %r31
+
     ld  %r14, 8(%r1)  # restore R14
     ld  %r15, 16(%r1)  # restore R15
     ld  %r16, 24(%r1)  # restore R16
@@ -180,7 +189,7 @@ jump_fcontext:
     mtctr  %r12

     # adjust stack
-    addi  %r1, %r1, 184
+    addi  %r1, %r1, 200

 #if _CALL_ELF == 2
     # copy transfer_t into transfer_fn arg registers
diff --git a/libs/context/src/asm/ontop_ppc64_sysv_elf_gas.S b/libs/context/src/asm/ontop_ppc64_sysv_elf_gas.S
index cd97f4567..f8954edcf 100644
--- a/libs/context/src/asm/ontop_ppc64_sysv_elf_gas.S
+++ b/libs/context/src/asm/ontop_ppc64_sysv_elf_gas.S
@@ -97,7 +97,7 @@ ontop_fcontext:
 # endif
 #endif
     # reserve space on stack
-    subi  %r1, %r1, 184
+    subi  %r1, %r1, 200

 #if _CALL_ELF != 2
     std  %r2,  0(%r1)  # save TOC
@@ -133,6 +133,10 @@ ontop_fcontext:
     # save LR as PC
     std   %r0, 176(%r1)

+    # save VS63
+    li %r31, 184
+    stvx %v31, %r1, %r31
+
     # store RSP (pointing to context-data) in R7
     mr  %r7, %r1

@@ -144,6 +148,10 @@ ontop_fcontext:
     mr  %r1, %r4
 #endif

+    # restore VS63
+    li %r31, 184
+    lvx %v31, %r1, %r31
+
     ld  %r14, 8(%r1)  # restore R14
     ld  %r15, 16(%r1)  # restore R15
     ld  %r16, 24(%r1)  # restore R16
@@ -203,7 +211,7 @@ return_to_ctx:
     mtlr  %r0

     # adjust stack
-    addi  %r1, %r1, 184
+    addi  %r1, %r1, 200

     # jump to context
     bctr
chenzheng1030 commented 2 months ago

If no objection, I am going to close this issue as this is sources issue. Feel free to open if more info needed.