irods / irods_capability_storage_tiering

BSD 3-Clause "New" or "Revised" License
5 stars 10 forks source link

Large migration jobs are crashing iRODS agents #193

Open alanking opened 2 years ago

alanking commented 2 years ago

When running decently sized migrations (>100) it is common to see stacktraces appear in the log which look like this:

 0# stacktrace_signal_handler in /usr/lib/libirods_server.so.4.3.0
 1# 0x00007FB580031980 in /lib/x86_64-linux-gnu/libpthread.so.0
 2# std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__is_long() const at /opt/irods-externals/clang13.0.0-0/include/c+ +/v1/string:1456
 3# std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::size() const at /opt/irods-externals/clang13.0.0-0/include/c++/v1/string:975
 4# std::__1::_MetaBase<__can_be_converted_to_string_view<char, std::__1::char_traits<char>, std::__1::b asic_string_view<char, std::__1::char_traits<char> > >::value>::_EnableIfImpl<int> std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::compare<std::__1::basic_string_view<char, std::__1::char_traits<char> > >(std::__1::basic_string_view<char, std::__1::cha r_traits<char> > const&) const at /opt/irods-externals/clang13.0.0-0/include/c++/v1/string:3903
 5# 0x00007FB575497D06
 6# 0x00007FB575497CAD
 7# 0x00007FB57554C351
 8# 0x00007FB57554C2F5
 9# 0x00007FB57554EB38
10# 0x00007FB57554EA36
11# 0x00007FB575491CBD
12# 0x00007FB57548ABBC
13 # 0x00007FB57556406B
14# 0x00007FB575563FFF
15# 0x00007FB575563F7F
16# 0x00007FB57556306E
17# 0x00007FB58412C177 in /usr/lib/libirods_server.so.4.3.0
18# std::__1::function<irods::error (std::__1::tuple<>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<c har> > const&, bool&)>::operator()(std::__1::tuple<>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&) const in /usr/lib/libirods_server.so.4.3.0
19# irods::pluggable_rule_engine<std::__1::tuple<> >::rule_exists(std::__1::basic_string<ch ar, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::tuple<>&, bool&) in /usr/lib/libirods_server.so.4.3.0
20# irods::error irods::control<irods::rule_exists_manager<std::__1::tuple<>, RuleExecInfo*>::rule_exists(std::__1::basic_string<char, std::__1::char_traits< char>, std::__1::allocator<char> > const&, bool&)::'lambda'(irods::re_pack_inp<std::__1::tuple<> >&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), irods::rule_exists_manager<std::__1::tuple<>, RuleExecInfo*>::rule_exists(std::__1::basic_strin g<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&)::'lambda'(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&), std::__1::tuple<> >(std::__1::list<irods::re_pack_inp<std::__1::tuple<> >, std::__1::allocator<irods::re_p ack_inp<std::__1::tuple<> > > >&, irods::rule_exists_manager<std::__1::tuple<>, RuleExecInfo*>::rule_exists(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&)::'lambda'(irods::re_pack_inp<std::__1::tuple<> >&, std::__1::basic_string<char, st d::__1::char_traits<char>, std::__1::allocator<char> > const&), irods::rule_exists_manager<std::__1::tuple<>, RuleExecInfo*>::rule_exists(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&)::'lambda'(std::__1::basic_string<char, std::__1::cha r_traits<char>, std::__1::allocator<char> > const&), std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) in /usr/lib/libirods_server.so.4.3.0
21# irods::rule_exists_manager<std::__1::tuple<>, RuleExecInfo*>::rule_exists(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool&) in /usr/lib/libirods_server.so.4.3.0
22# irods::error irods::plugin_base::invoke_policy_enforcement_point<char const*, BytesBuf const*, BytesBuf const*, BytesBuf const*, int, iRODSProtocol>(irods::rule_engine_context_ manager<std::__1::tuple<>, RuleExecInfo*, (irods::rule_execution_manager_pack)0>, irods::plugin_context&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std ::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, char const*, BytesBuf const*, BytesBuf const*, BytesBuf const*, int, iRODSProtocol) in /usr/lib/libirods_server.so.4.3.0
23# irods::error irods::plugin_base::call<char const*, BytesBuf const*, BytesB uf const*, BytesBuf const*, int, iRODSProtocol>(RsComm*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::shared_ptr<irods::first_class_object>, char const*, BytesBuf const*, BytesBuf const*, BytesBuf const*, int, iRODSProtocol)::'lambda'()::operator()() const in /usr/lib/libirods_server.so.4.3.0
24# irods::at_scope_exit<irods::error irods::plugin_base::call<char const*, BytesBuf const*, BytesBuf const*, BytesBuf const*, int, iRODSProtocol>(RsComm*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocat or<char> > const&, boost::shared_ptr<irods::first_class_object>, char const*, BytesBuf const*, BytesBuf const*, BytesBuf const*, int, iRODSProtocol)::'lambda'()>::~at_scope_exit() in /usr/lib/libirods_server.so.4.3.0
25# irods::error irods::plugin_base::call<char const*, BytesBuf const*, B ytesBuf const*, BytesBuf const*, int, iRODSProtocol>(RsComm*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, boost::shared_ptr<irods::first_class_object>, char const*, BytesBuf const*, BytesBuf const*, BytesBuf const*, int, iRODSProtocol) in /u sr/lib/libirods_server.so.4.3.0
26# sendRodsMsg(boost::shared_ptr<irods::network_object>, char const*, BytesBuf const*, BytesBuf const*, BytesBuf const*, int, iRODSProtocol) in /usr/lib/libirods_server.so.4.3.0
27# sendStartupPack in /usr/lib/libirods_server.so.4.3.0
28# connectToRhost i n /usr/lib/libirods_server.so.4.3.0
29# _rcConnect in /usr/lib/libirods_server.so.4.3.0
30# 0x00007FB575E86BDE
31# 0x00007FB575E91712
32# 0x00007FB575E921C7
33# boost::asio::detail::scheduler_operation::complete(void*, boost::system::error_code const&, unsigned long) in /usr/lib/libiro ds_server.so.4.3.0
34# boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) in /usr/lib/libirods_server.so.4.3.0
35# boost::asio::detail::scheduler::run(boo st::system::error_code&) in /usr/lib/libirods_server.so.4.3.0
36# boost::asio::thread_pool::thread_function::operator()() in /usr/lib/libirods_server.so.4.3.0
37# boost::asio::detail::posix_thread::func<boost::asio::thread_pool::thread_function>::run() in /usr/lib/libirods_server.so.4.3.0 
38# boost_asio_detail_posix_thread_function in /usr/lib/libirods_server.so.4.3.0
39# 0x00007FB5800266DB in /lib/x86_64-linux-gnu/libpthread.so.0
40# clone in /lib/x86_64-linux-gnu/libc.so.6

The crash (SIG11) appears to occur when checking for the existence of a rule in the plugin. I have also observed a crash in the jobs submitted to the query processor.

My immediate guess is a dangling reference or two, but this needs to be investigated.