irods / irods_rule_engine_plugin_python

BSD 3-Clause "New" or "Revised" License
10 stars 14 forks source link

Agent with PREP configured as the only REP segfaults when `iput` uses legacy parallel transfer #218

Closed korydraughn closed 2 weeks ago

korydraughn commented 3 weeks ago

This report only applies to the following:

The following stacktrace was captured by the iRODS server during testing of 4.3.3. The test which triggered the stacktrace is test_icp.Test_Icp.test_multithreaded_icp__issue_5478. It has been observed that the stacktrace is triggered 100% of the time.

 0# stacktrace_signal_handler in /lib/libirods_server.so.4.3.3
 1# 0x00007F47CE4AA320 in /lib/x86_64-linux-gnu/libc.so.6
 2# 0x00007F47C9302C51 in /lib/x86_64-linux-gnu/libpython3.12.so.1.0
 3# PyUnicode_New in /lib/x86_64-linux-gnu/libpython3.12.so.1.0
 4# 0x00007F47C935F0DF in /lib/x86_64-linux-gnu/libpython3.12.so.1.0
 5# boost::python::detail::str_base::str_base(char const*) in /opt/irods-externals/boost-libcxx1.81.0-1/lib/libboost_python312.so.1.81.0
 6# 0x00007F47C9CC2095 in /usr/lib/irods/plugins/rule_engines/libirods_rule_engine_plugin-python.so
 7# std::__1::__function::__func<irods::error (*)(std::__1::tuple<> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&), std::__1::allocator<irods::error (*)(std::__1::tuple<> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&)>, irods::error (std::__1::tuple<>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&)>::operator()(std::__1::tuple<>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&) in /usr/lib/irods/plugins/rule_engines/libirods_rule_engine_plugin-python.so
 8# irods::pluggable_rule_engine<std::__1::tuple<> >::rule_exists(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::tuple<>&, bool&) in /lib/libirods_server.so.4.3.3
 9# irods::error irods::control<irods::rule_exists_manager<std::__1::tuple<>, RuleExecInfo*>::rule_exists(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&)::'lambda'(irods::re_pack_inp<std::__1::tuple<> >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), irods::rule_exists_manager<std::__1::tuple<>, RuleExecInfo*>::rule_exists(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&)::'lambda'(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), std::__1::tuple<> >(std::__1::list<irods::re_pack_inp<std::__1::tuple<> >, std::__1::allocator<irods::re_pack_inp<std::__1::tuple<> > > >&, irods::rule_exists_manager<std::__1::tuple<>, RuleExecInfo*>::rule_exists(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&)::'lambda'(irods::re_pack_inp<std::__1::tuple<> >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), irods::rule_exists_manager<std::__1::tuple<>, RuleExecInfo*>::rule_exists(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&)::'lambda'(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in /lib/libirods_server.so.4.3.3
10# irods::error irods::plugin_base::invoke_policy_enforcement_point<long long, int>(irods::rule_engine_context_manager<std::__1::tuple<>, RuleExecInfo*, (irods::rule_execution_manager_pack)0>, irods::plugin_context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, long long, int) in /lib/libirods_server.so.4.3.3
11# irods::error irods::plugin_base::call<long long const, int const>(RsComm*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::shared_ptr<irods::first_class_object>, long long const, int const) in /lib/libirods_server.so.4.3.3
12# fileLseek(RsComm*, boost::shared_ptr<irods::first_class_object>, long long, int) in /lib/libirods_server.so.4.3.3
13# _rsFileLseek(RsComm*, FileLseekInp*, FileLseekOut**) in /lib/libirods_server.so.4.3.3
14# rsFileLseek(RsComm*, FileLseekInp*, FileLseekOut**) in /lib/libirods_server.so.4.3.3
15# _l3Lseek(RsComm*, int, long long, int) in /lib/libirods_server.so.4.3.3
16# partialDataPut(PortalTransferInp*) in /lib/libirods_server.so.4.3.3
17# 0x00007F47CEC8FD7D in /opt/irods-externals/boost-libcxx1.81.0-1/lib/libboost_thread.so.1.81.0
18# 0x00007F47CE501A94 in /lib/x86_64-linux-gnu/libc.so.6
19# __clone in /lib/x86_64-linux-gnu/libc.so.6

After more investigation, the segfault points to the construction of a boost::python::str. This is captured in the original stacktrace. The segfault happens here: https://github.com/irods/irods_rule_engine_plugin_python/blob/6c040ef60494e3a1ab8775f4135bbd701e78e647/src/main.cpp#L467

"core" results in the implicit construction of a boost::python::str.

After even more experiments, it turns out that the segfault only occurs when multiple threads are in use (i.e. legacy parallel transfer). If threading is disabled (i.e. iput -N0), everything works - no segfault occurs.

The segfault can be reproduced outside of iRODS using the following program.

// file: main.cpp

#include <boost/python.hpp>                                                                              
#include <thread>                                                                                        
#include <mutex>                                                                                         
#include <chrono>                                                                                        
#include <vector>                                                                       

std::recursive_mutex m;                                                                                  

int main()                                                                                               
{                                                                                                        
        Py_InitializeEx(0);                                                                              

        std::vector<std::thread> v;                                                                      
        v.reserve(4);                                                                                    

        for (int i = 0; i < 4; ++i) {                                                                    
                v.emplace_back([] {
                        std::scoped_lock lk{m};                                                          
                        boost::python::str s{"core"};                                                    
                });                                                                                      
        }                                                                                                

        for (int i = 0; i < 4; ++i)                                                                      
                v[i].join();                                                                             
}   

Here is the script I used to compile it.

#! /bin/bash                                                      

export PATH=/opt/irods-externals/clang13.0.1-0/bin:$PATH          

clang++ main.cpp \                                                
        -g \                                                      
        -std=c++20 \                                              
        -stdlib=libc++ \                                          
        -nostdinc++ \                                             
        -I /usr/include/python3.12 \                              
        -I /opt/irods-externals/clang13.0.1-0/include/c++/v1 \    
        -I /opt/irods-externals/boost-libcxx1.81.0-1/include \    
        -L /opt/irods-externals/boost-libcxx1.81.0-1/lib \        
        -Wl,-rpath /opt/irods-externals/boost-libcxx1.81.0-1/lib \
        -Wl,-rpath /opt/irods-externals/clang13.0.1-0/lib \       
        -lboost_python312 \                                       
        -lpython3.12 \                                            
        -pthread                                                  
korydraughn commented 3 weeks ago

From @SwooshyCueb.

... the string-related code in boost.pyhton hasn't changed in eight years, so if 1.81 is incompatible, all boost versions are incompatible. ...

korydraughn commented 3 weeks ago

Here's the line in the test which triggers the segfault.

korydraughn commented 2 weeks ago

@SwooshyCueb Please close if completed.

SwooshyCueb commented 2 weeks ago

I'm keeping this open for now since I'm going to improve on the solution in my virtual environment PR

korydraughn commented 2 weeks ago

The work for this issue is complete and is going to be part of the 4.3.3 PREP release.

Please open a new issue that is specific to the virtual environment improvements.