Open Gabrielcarvfer opened 1 year ago
i have the same problem
Is there a reproducer?
I cannot guarantee that it's the same issue, but I see a lot of Failed to materialize symbols
errors in cppyy's test suite. For example, I get similar errors when I run
pytest test_stltypes.py -k 'test04_from_cpp or test02_deque_cpp17_style or test06_initialize_from_dict'
The Failed to materialize symbols
always seem to be preceded by
symbol '_ZNSaIcE10deallocateEPcm' unresolved while linking symbol '__cf_5'!
You are probably missing the definition of std::allocator<char>::deallocate(char*, unsigned long)
Maybe you need to load the corresponding shared library?
On the total test suite I get 125 fails and 10 errors, so I definitely have some issue. This is in a clean Debian 11.7 Docker container with its default GCC 10.2.1-6 and following the build instructions at https://cppyy.readthedocs.io/en/latest/repositories.html. I needed to make two minor modifications to make things build. For completeness, these were
coroutine
header in cppyy-backend, because GCC complains that it requires -fcoroutine
. diff --git a/cling/src/build/unix/makepchinput.py b/cling/src/build/unix/makepchinput.py
index 2fc9a56..f8602c7 100755
--- a/cling/src/build/unix/makepchinput.py
+++ b/cling/src/build/unix/makepchinput.py
@@ -168,7 +168,7 @@ def getSTLIncludes():
"bit",
"compare",
"concepts",
- "coroutine",
+# "coroutine",
"format",
"latch",
"numbers",
template
specifier in cppyy tests:
diff --git a/test/stltypes.cxx b/test/stltypes.cxx
index 33260ae..ee882c4 100644
--- a/test/stltypes.cxx
+++ b/test/stltypes.cxx
@@ -12,9 +12,9 @@ namespace std {
namespace __gnu_cxx {
#define ns_prefix
#endif
-template bool ns_prefix operator==(const std::vector<int>::iterator&,
+bool ns_prefix operator==(const std::vector<int>::iterator&,
const std::vector<int>::iterator&);
-template bool ns_prefix operator!=(const std::vector<int>::iterator&,
+bool ns_prefix operator!=(const std::vector<int>::iterator&,
const std::vector<int>::iterator&);
}
#endif
I hope this can help you replicate these problems.
Edit: forgot to mention that I exported STDCXX=20
in the container prior to the build.
(unrelated: I did not manage to get cppyy-backend to build on Ubuntu 22.04 with GCC 11)
I didn't manage to isolate the source of the issue yet. And fun fact: the error seems to have gone away for some reason...
But the failed to materialize symbols
persisted.
To reproduce the example I mentioned, you can use the following:
git clone https://gitlab.com/nsnam/ns-3-dev
cd ns-3-dev
./ns3 configure --enable-modules=core,network,applications,point-to-point --enable-python-bindings
./ns3 run first.py
And it should output:
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { __cxx_global_var_initcling_module_138_, _GLOBAL__sub_I_cling_module_138, _ZN3ns3L16g_TimeStaticInitE, $.cling-module-138.__inits.0, __orc_init_func.cling-module-138 }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { __orc_init_func.cling-module-138 }) }
At time +2s client sent 1024 bytes to 10.1.1.2 port 9
At time +2.00369s server received 1024 bytes from 10.1.1.1 port 49153
At time +2.00369s server sent 1024 bytes to 10.1.1.1 port 49153
At time +2.00737s client received 1024 bytes from 10.1.1.2 port 9
Thanks! I'll start with the assumption that it's a problem with STDCXX=20
not propagating. There were changes made to that chain after the release of 3.0.0, so build from source may well behave differently than that latest release. I also asked upstream and they said something about fixes on their end for an issue that looked similar (but they were working off memory and so weren't sure).
@wlav figured out what the devil causes the runStaticInitializersOnce
.
I've always loaded headers first, then their respective libraries, following your Zlib example.
However, some variables in some of these headers are initialized by calling static functions, defined in the source files. Since their libraries haven't been loaded yet, they fail.
It would be the equivalent of
import cppyy
cppyy.cppdef("""
bool call_func();
static bool trap = call_func();
""")
cppyy.cppdef("""
bool call_func()
{
return !trap;
}
""")
Not sure if it is possible to defer these initializations until the symbols become available via the load_library (which I believe makes more sense than evaluating these right away since we are dealing with c++ headers). Reversing the include and load_library seems to work, but I started getting a malloc_consolidate issue at the end. Still checking with sanitizers.
~~Yeah, simulations run to completion and then when the program is closing I get the following.
Accepting suggestions on how to debug this (external libraries built with -g -O0 -fsanitize=address,leak,undefined -fno-sanitize-recover=all
, cppyy.set_debug(True)
and set EXTRA_CLING_ARGS="-std\=c++20 -g"
).~~
==161596==ERROR: AddressSanitizer: attempting double-free on 0x6060003785c0 in thread T0:
#0 0x7f3f4a4b6d07 in operator delete(void*) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:160
#1 0x7f3f4336c4b4 (<unknown module>)
#2 0x7f3f3d605186 in cling::IncrementalExecutor::runAtExitFuncs() (/home/gabriel/.local/lib/python3.10/site-packages/cppyy_backend/lib/libCling.so+0xc05186)
#3 0x7f3f3d48e0f3 in CppyyLegacy::TCling::~TCling() (/home/gabriel/.local/lib/python3.10/site-packages/cppyy_backend/lib/libCling.so+0xa8e0f3)
#4 0x7f3f3d48e118 in CppyyLegacy::TCling::~TCling() (/home/gabriel/.local/lib/python3.10/site-packages/cppyy_backend/lib/libCling.so+0xa8e118)
#5 0x7f3f42eefa6a in CppyyLegacy::TROOT::~TROOT() (/home/gabriel/.local/lib/python3.10/site-packages/cppyy_backend/lib/libCoreLegacy.so+0xefa6a)
#6 0x7f3f4a045494 in __run_exit_handlers stdlib/exit.c:113
#7 0x7f3f4a04560f in __GI_exit stdlib/exit.c:143
#8 0x7f3f4a029d96 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:74
#9 0x7f3f4a029e3f in __libc_start_main_impl ../csu/libc-start.c:392
#10 0x55d4d35176f4 in _start (/usr/bin/python3.10+0x2276f4)
Update: apparently someone initialized a variable with a dumb value and caused the memory corruption...
Interesting finding, and the platform-specific behavior is then probably that something similar exists in the g++ headers as you pointed out here: https://github.com/wlav/cppyy/issues/175
Not sure if it is possible to defer these initializations until the symbols
Actually yes, that's "the future." For the longest while I've been hoping to replace the PCH with user modules, but there are still issues with those and I never moved. Now upstream is saying that they can parse code faster by only taking in the declarations and deferring parsing the implementations. This is supposedly even faster than modules w/o the problems of baking in compiler flags/settings and other portability problems that modules suffer from (so does the PCH, to be sure, but it's less messy b/c there's only one). I suspect it could also solve/pre-empt this problem.
Actually, what I said above is already the case in the current release, meaning that the above code will succeed if trap
is part of e.g. a class, so that won't be a solution. (That said, the deferred parsing is only used with cppyy.cppdef
, not with cppyy.include
, which is why I didn't notice any performance difference yet in startup of large codes.)
Is there a way to preprocess a given header file with cppyy? I guess I could filter out these known problematic definitions and leave them for later, when I know it is safe to call them. In this specific example, we use that static variable to ensure the initialization of a library before doing actual work (which is a disaster on Windows, seems completely random), but I'm already loading the libraries in order via cppyy, so that isn't a problem there.
Just to add to my own optimistic message: likely the reason that it was not reproducible with some docker images, is that the two offending methods were introduced only with gcc12.
Comment above was meant for https://github.com/wlav/cppyy/issues/175 .
Should all be good now with release 3.1.0. Feel free to reopen if you find otherwise.
@wlav 3.1.0 fixed some of the issues, but still failing at static initializers
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { __orc_init_func.cling-module-148, _ZN3ns3L16g_TimeStaticInitE, _GLOBAL__sub_I_cling_module_148, $.cling-module-148.__inits.0, __cxx_global_var_initcling_module_148_ }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { _ZN3ns3L15g_batteryPresetE, __cxx_global_array_dtor, _GLOBAL__sub_I_cling_module_158, __orc_init_func.cling-module-158, __cxx_global_var_initcling_module_158_, $.cling-module-158.__inits.0, _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEC2IS3_EEPKcRKS3_ }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { _ZGVN3ns3L21AwgnErrorTableBcc1458E, __cxx_global_array_dtor.3, __cxx_global_var_initcling_module_178_.1, __orc_init_func.cling-module-178, _ZNSt8_Rb_treeIN3ns312WifiStandardESt4pairIKS1_NSt7__cxx114listINS0_11WifiPhyBandESaIS6_EEEESt10_Select1stIS9_ESt4lessIS1_ESaIS9_EE8_M_eraseEPSt13_Rb_tree_nodeIS9_E, __cxx_global_var_initcling_module_178_, _ZNSt3mapIN3ns312WifiStandardENSt7__cxx114listINS0_11WifiPhyBandESaIS4_EEESt4lessIS1_ESaISt4pairIKS1_S6_EEED1Ev, _ZN3ns3L22AwgnErrorTableLdpc1458E, _ZN3ns3L21AwgnErrorTableBcc1458E, _ZNSt3mapIN3ns312WifiStandardENSt7__cxx114listINS0_11WifiPhyBandESaIS4_EEESt4lessIS1_ESaISt4pairIKS1_S6_EEEC2ESt16initializer_listISB_ERKS8_RKSC_, __cxx_global_var_initcling_module_178_.4, _ZNSt8_Rb_treeIN3ns312WifiStandardESt4pairIKS1_NSt7__cxx114listINS0_11WifiPhyBandESaIS6_EEEESt10_Select1stIS9_ESt4lessIS1_ESaIS9_EE17_M_construct_nodeIJRKS9_EEEvPSt13_Rb_tree_nodeIS9_EDpOT_, __cxx_global_var_initcling_module_178_.2, _ZGVN3ns3L22AwgnErrorTableLdpc1458E, $.cling-module-178.__inits.0, _ZGVN3ns3L13wifiStandardsB5cxx11E, __cxx_global_array_dtor.5, _ZN3ns3L19AwgnErrorTableBcc32E, _ZN3ns3L13wifiStandardsB5cxx11E, __clang_call_terminate, _ZGVN3ns3L19AwgnErrorTableBcc32E }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { _ZGVZN3ns34Time14PeekResolutionEvE10resolution, __cxx_global_var_initcling_module_179_, _ZGVN3ns3L27UE_MEASUREMENT_REPORT_DELAYE, __orc_init_func.cling-module-179, _ZZN3ns34Time14PeekResolutionEvE10resolution, _ZN3ns3L27UE_MEASUREMENT_REPORT_DELAYE, $.cling-module-179.__inits.0, _ZN3ns34TimeD1Ev }) }
[runStaticInitializersOnce]: Failed to materialize symbols: { (main, { __orc_init_func.cling-module-148 }) }
IncrementalExecutor::executeFunction: symbol '_ZNSaIcE10deallocateEPcm' unresolved while linking symbol '__cf_12'!
You are probably missing the definition of std::allocator<char>::deallocate(char*, unsigned long)
Maybe you need to load the corresponding shared library?
IncrementalExecutor::executeFunction: symbol '_ZNSaIcE10deallocateEPcm' unresolved while linking symbol '__dtor_13'!
You are probably missing the definition of std::allocator<char>::deallocate(char*, unsigned long)
Maybe you need to load the corresponding shared library?
IncrementalExecutor::executeFunction: symbol '_ZNSaIcE10deallocateEPcm' unresolved while linking symbol '__cf_14'!
You are probably missing the definition of std::allocator<char>::deallocate(char*, unsigned long)
Maybe you need to load the corresponding shared library?
IncrementalExecutor::executeFunction: symbol '_ZNSaIcE10deallocateEPcm' unresolved while linking symbol '__dtor_15'!
You are probably missing the definition of std::allocator<char>::deallocate(char*, unsigned long)
Maybe you need to load the corresponding shared library?
cling JIT session error: Failed to materialize symbols: { (main, { __clang_call_terminate }) }
Traceback (most recent call last):
File "/ns-3-dev/examples/tutorial/first.py", line 35, in <module>
devices = pointToPoint.Install(nodes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: none of the 5 overloaded methods succeeded. Full details:
ns3::NetDeviceContainer ns3::PointToPointHelper::Install(ns3::NodeContainer c) =>
ValueError: nullptr result where temporary expected
ns3::NetDeviceContainer ns3::PointToPointHelper::Install(ns3::Ptr<ns3::Node> a, ns3::Ptr<ns3::Node> b) =>
TypeError: takes at least 2 arguments (1 given)
ns3::NetDeviceContainer ns3::PointToPointHelper::Install(ns3::Ptr<ns3::Node> a, std::string bName) =>
TypeError: takes at least 2 arguments (1 given)
ns3::NetDeviceContainer ns3::PointToPointHelper::Install(std::string aName, ns3::Ptr<ns3::Node> b) =>
TypeError: takes at least 2 arguments (1 given)
ns3::NetDeviceContainer ns3::PointToPointHelper::Install(std::string aNode, std::string bNode) =>
TypeError: takes at least 2 arguments (1 given)
cling JIT session error: Failed to materialize symbols: { (main, { __clang_call_terminate }) }
IncrementalExecutor::executeFunction: symbol '_ZNSaIcE10deallocateEPcm' unresolved while linking symbol '__dtor_18'!
You are probably missing the definition of std::allocator<char>::deallocate(char*, unsigned long)
Maybe you need to load the corresponding shared library?
For #175, there's a workaround for the string that now seems to cover more systems. It may then not solve this one then (I was hopeful it was the same thing), as there's also this issue that upstream is working on: https://github.com/cms-sw/cmssw/issues/43077#issuecomment-1781310510
I believe the static initialization issue is the same as https://github.com/root-project/root/issues/12988 Based on the cmssw issue, I would actually expect cling to load all the libraries in the path to look up for the symbol. Did the cling version change much from cppyy 2.4.2 to 3.0.0? Maybe I can help find the bug, just need some general directions. :)
Yes, 2.4.2 to 3.0.0 was a massive update: llvm9 -> llvm13. JITLink was also introduced along the way. (Staring cppyy 4.0, these backends update will be simpler as that'll be based on clang-repl (is from the same folks). It should also allow usage of stock llvm.)
hello all,
I have the same problem with static initializers. Have you guys found any solution yet?
Thanks!
A few cases appear to have been fixed (#175 as mentioned above), but there's no solution yet that solves everything.
Hmm, a workaround for this issue.
class Time
{
public:
static StaticInit(); // defined in .cpp
};
// current code: does not call Time::StaticInit due to symbol materialization issue
bool g_staticInit = Time::StaticInit();
// alternative code: calls Time::StaticInit in spite of symbol materialization issue
class TimeInitHelper
{
public:
TimeInitHelper()
{
Time::StaticInit();
}
};
static TimeInitHelper g_staticInit; // g_staticInit fails to be materialized, but StaticInit is called anyways
That workaround may provide a clue as the symbol lookup probably occurs at different times for the static initializer v.s. the function call. I can dig into that.
This also breaks Boost AIO, which uses this same pattern over and over.
e.g. from https://github.com/boostorg/asio/blob/develop/include/boost/asio/error.hpp
static const boost::system::error_category&
system_category BOOST_ASIO_UNUSED_VARIABLE
= boost::asio::error::get_system_category();
static const boost::system::error_category&
netdb_category BOOST_ASIO_UNUSED_VARIABLE
= boost::asio::error::get_netdb_category();
static const boost::system::error_category&
addrinfo_category BOOST_ASIO_UNUSED_VARIABLE
= boost::asio::error::get_addrinfo_category();
static const boost::system::error_category&
misc_category BOOST_ASIO_UNUSED_VARIABLE
= boost::asio::error::get_misc_category();
Holding out hope that it is fixed in llvm16 (current upstream) or llvm18 (which upstream is working on).
I found out its actually because of ns3 and cppyy versions
I have same problem with cppyy==3.1.2 and ns-3.38, but with cppyy2.4.2, no problem at all In this doc and this doc only: https://www.nsnam.org/docs/installation/html/system.html#python-bindings-ns-3-37-and-newer
ns-3 Python support now uses cppyy. Version 3.1.2 is the most recent supported cppyy release since ns-3.42. Cppyy version 2.4.2 should be used from ns-3.37 up to 3.41. Due to an upstream limitation with cppyy, Python bindings do not work on macOS machines with Apple silicon (M1 and M2 processors).
with cppyy2.4.2, no problem at all
Hi,
After updating to Cppyy 3.0 I started getting these messages while testing it with ns-3.
Basically, a static initializer for
PeekResolution
didn't get executed, which resulted in a nullptr instead of aTimeResolution
value. This caused the overload ofns3::Seconds
to fail with aValueError
.I have a workaround for that specific case, but wanted to point out it happened since I haven't seen any similar report here. The C++ tests are working just fine on Clang14.
Not completely sure if it is just our implementation being wrong again (that incorrect regex type was pretty bad), or it is something new in cling/cppyy, since it was working just fine on Cppyy==2.4.2.
Maybe related to https://github.com/root-project/root/issues/12481 or https://github.com/root-project/root/issues/12294?