root-project / root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically
https://root.cern
Other
2.66k stars 1.27k forks source link

Bad error handling with pyroot template instantiations #11854

Open bendavid opened 1 year ago

bendavid commented 1 year ago

In some cases with template instantiation from pyroot together with RDF code which should fail to compile with clear error messages leads instead to cryptic output. Even worse, execution continues and gives undefined results.

Example

import ROOT

ret = ROOT.gInterpreter.Declare('#include "test.h"')

print("declare ret", ret)

print("creating helper")
helper = ROOT.helper[ROOT.std.vector["double"]]()

d = ROOT.ROOT.RDataFrame(1000)

print("Define with helper")
d = d.Define("fx", helper, [])
print("done define")

res = d.Sum("fx")

print(res.GetValue())
template <typename T>
class helper {

public:

  helper() {}

  std::size_t operator() () {
    const double res = 0.;
    res = T{0, 0}.size();
    return res;
  }

};

Note the assignment to const double which should fail to compile.

output (centos stream 8, root 6.26/10)

declare ret True
creating helper
Define with helper
IncrementalExecutor::executeFunction: symbol '_ZN6helperISt6vectorIdSaIdEEEclEv' unresolved while linking symbol '__cf_13'!
You are probably missing the definition of helper<std::vector<double, std::allocator<double> > >::operator()()
Maybe you need to load the corresponding shared library?
done define
9.408421539373414e+16

(if the spurious const is removed then instead one gets the correct output of 2000 and no error message)

bendavid commented 1 year ago

The issue seems to be triggered in particular when the invalid code is compiled only as part of the template instantiation of Define by pyroot.

bendavid commented 1 year ago

An even simpler example which doesn't depend on RDF:

import ROOT

ret = ROOT.gInterpreter.Declare('#include "test.h"')

print("declare ret", ret)

print("creating helper")
helper = ROOT.Helper[ROOT.std.vector["double"]]()

res = ROOT.call_helper(helper)
print(res)

test.h:

template <typename T>
class Helper {

public:

  Helper() {}

  std::size_t operator() () const {
    const std::size_t res = 0;
    res = T{0, 0}.size();
    return res;
  }

};

template <typename H>
std::size_t call_helper(const H &helper) {
  return helper();
}

Output (again with centos stream 8, root 6.26/10):

declare ret True
creating helper
IncrementalExecutor::executeFunction: symbol '_ZNK6HelperISt6vectorIdSaIdEEEclEv' unresolved while linking symbol '__cf_11'!
You are probably missing the definition of Helper<std::vector<double, std::allocator<double> > >::operator()() const
Maybe you need to load the corresponding shared library?
18446744073709551615
bendavid commented 1 year ago

The same behaviour happens on current nightlies.

eguiraud commented 1 year ago

Thank you for the report Josh! This seems to be a problem in upstream cppyy as well:

(cppyy-env) /t/cppyy-test python -c 'import cppyy; print(cppyy.__version__)'
2.4.1
(cppyy-env) /t/cppyy-test bat test.h test.py
───────┬───────────────────────────────────────────────────────────────────────────────────────
       │ File: test.h
───────┼───────────────────────────────────────────────────────────────────────────────────────
   1   │ template <typename T>
   2   │ class Helper {
   3   │
   4   │ public:
   5   │
   6   │   Helper() {}
   7   │
   8   │   std::size_t operator() () const {
   9   │     const std::size_t res = 0;
  10   │     res = T{0, 0}.size();
  11   │     return res;
  12   │   }
  13   │
  14   │ };
  15   │
  16   │ template <typename H>
  17   │ std::size_t call_helper(const H &helper) {
  18   │   return helper();
  19   │ }
───────┴───────────────────────────────────────────────────────────────────────────────────────
───────┬───────────────────────────────────────────────────────────────────────────────────────
       │ File: test.py
───────┼───────────────────────────────────────────────────────────────────────────────────────
   1   │ import cppyy
   2   │
   3   │ ret = cppyy.include('test.h')
   4   │
   5   │ print("declare ret", ret)
   6   │
   7   │ print("creating helper")
   8   │ helper = cppyy.gbl.Helper[cppyy.gbl.std.vector["double"]]()
   9   │
  10   │ res = cppyy.gbl.call_helper(helper)
  11   │ print(res)
───────┴───────────────────────────────────────────────────────────────────────────────────────
(cppyy-env) /t/cppyy-test python test.py
declare ret True
creating helper
IncrementalExecutor::executeFunction: symbol '_ZNK6HelperISt6vectorIdSaIdEEEclEv' unresolved while linking symbol '__cf_5'!
You are probably missing the definition of Helper<std::vector<double, std::allocator<double> > >::operator()() const
Maybe you need to load the corresponding shared library?
18446744073709551615
bendavid commented 1 year ago

Another very simple example, less serious because at least it fails, but still problematic because it makes debugging very difficult.

test.h:

template <typename T>
int some_template_function(const T &x) {
  return x.size();
}

test.py:

import ROOT

ret = ROOT.gInterpreter.Declare('#include "test.h"')

print("declare ret", ret)

res0 = ROOT.some_template_function(ROOT.std.vector["double"]())
print("res0", res0)

res1 = ROOT.some_template_function(0.0)
print("res1", res1)

output:

declare ret True
res0 0
Traceback (most recent call last):
  File "/home/b/bendavid/pyrootdebug2/test.py", line 10, in <module>
    res1 = ROOT.some_template_function(0.0)
TypeError: Template method resolution failed:
  int ::some_template_function(const vector<double>& x) =>
    TypeError: could not convert argument 1
  Failed to instantiate "some_template_function(double)"

Compare to the same in c++

  #include "test.h"

const int res = some_template_function(0.0);

Then the output of clang++ with nice error message is:

  In file included from test.cpp:1:
./test.h:3:11: error: member reference base type 'const double' is not a structure or union
  return x.size();
         ~^~~~~
test.cpp:3:17: note: in instantiation of function template specialization 'some_template_function<double>' requested here
const int res = some_template_function(0.0);
                ^
1 error generated.

So in the pyroot case all of the useful compiler errors are suppressed.

bendavid commented 1 year ago

Ok so there are few different issues here.

Leaving aside the question of why the template error isn't detected, the reason why it's not even printed to the console is because the problematic instantiation occurs via the LookupHelper which calls setSuppressAllDiagnostics by default.

The stack trace leading up to the instantiation looks as below*.

So as a workaround at least the user can call from python ROOT.gDebug = 6 to get the full output from clang.

Maybe the defaults need to be reconsidered here.

The optimal solution here for pyroot/cppyy in general is maybe a bit involved. What is probably desired is 1) Don't print the clang output, but store any errors or warnings during template instantiation attempts e.g. using cling::utils::DiagnosticsStore 2) If no template is successfully instantiated then print the stored warnings and errors for each case 3) If a template is successfully instantiated, but had warnings, at least print those for the successful case.

I have some idea what's going wrong with the error detection as well, more news on that soon.

*

Thread 1 "python" hit Breakpoint 1, 0x00007ffff2df28e0 in clang::Sema::InstantiateFunctionDefinition(clang::SourceLocation, clang::FunctionDecl*, bool, bool, bool) () from /home/b/bendavid/rootdev28/install/lib/libCling.so
(gdb) bt
#0  0x00007ffff2df28e0 in clang::Sema::InstantiateFunctionDefinition(clang::SourceLocation, clang::FunctionDecl*, bool, bool, bool) ()
   from /home/b/bendavid/rootdev28/install/lib/libCling.so
#1  0x00007ffff1a30ff7 in cling::overloadFunctionSelector(clang::DeclContext*, bool, llvm::SmallVectorImpl<clang::Expr*> const&, clang::LookupResult&, clang::DeclarationNameInfo&, clang::TemplateArgumentListInfo const*, clang::ASTContext&, clang::Parser&, clang::Sema&, cling::LookupHelper::DiagSetting) () from /home/b/bendavid/rootdev28/install/lib/libCling.so
#2  0x00007ffff1a387f7 in clang::FunctionDecl const* cling::findFunction<clang::FunctionDecl const*>(clang::DeclContext*, llvm::StringRef, llvm::SmallVectorImpl<clang::Expr*> const&, bool, clang::ASTContext&, cling::Interpreter*, clang::FunctionDecl const* (*)(clang::DeclContext*, bool, llvm::SmallVectorImpl<clang::Expr*> const&, clang::LookupResult&, clang::DeclarationNameInfo&, clang::TemplateArgumentListInfo const*, clang::ASTContext&, clang::Parser&, clang::Sema&, cling::LookupHelper::DiagSetting), cling::LookupHelper::DiagSetting) ()
   from /home/b/bendavid/rootdev28/install/lib/libCling.so
#3  0x00007ffff1a38c6d in clang::FunctionDecl const* cling::execFindFunction<cling::ParseProto, clang::FunctionDecl const*>(clang::Parser&, cling::Interpreter*, cling::LookupHelper&, clang::Decl const*, llvm::StringRef, cling::ParseProto::ArgsInput const&, bool, clang::FunctionDecl const* (*)(clang::DeclContext*, bool, llvm::SmallVectorImpl<clang::Expr*> const&, clang::LookupResult&, clang::DeclarationNameInfo&, clang::TemplateArgumentListInfo const*, clang::ASTContext&, clang::Parser&, clang::Sema&, cling::LookupHelper::DiagSetting), cling::LookupHelper::DiagSetting) () from /home/b/bendavid/rootdev28/install/lib/libCling.so
#4  0x00007ffff1a3571a in cling::LookupHelper::findFunctionProto(clang::Decl const*, llvm::StringRef, llvm::StringRef, cling::LookupHelper::DiagSetting, bool) const ()
   from /home/b/bendavid/rootdev28/install/lib/libCling.so
#5  0x00007ffff19b13aa in TClingClassInfo::GetMethod (this=this@entry=0x7fffffffbfe0, fname=fname@entry=0x7fffffffc530 "call_helper", 
    proto=proto@entry=0x55555c6b3560 "Helper<vector<double> >&", objectIsConst=objectIsConst@entry=false, poffset=poffset@entry=0x0, mode=ROOT::kConversionMatch, 
    imode=TClingClassInfo::kWithInheritance) at /home/b/bendavid/rootdev28/root/core/metacling/src/TClingDeclInfo.h:38
#6  0x00007ffff195a91a in TCling::GetFunctionWithPrototype (this=<optimized out>, opaque_cl=<optimized out>, method=0x7fffffffc530 "call_helper", 
    proto=0x55555c6b3560 "Helper<vector<double> >&", objectIsConst=<optimized out>, mode=<optimized out>) at /home/b/bendavid/rootdev28/root/core/metacling/src/TCling.cxx:5082
#7  0x00007ffff723799c in TROOT::GetGlobalFunctionWithPrototype (this=0x7ffff7499940 <ROOT::Internal::GetROOT1()::alloc>, function=0x7fffffffc530 "call_helper", 
    proto=0x55555c6b3560 "Helper<vector<double> >&", load=load@entry=false) at /home/b/bendavid/rootdev28/root/core/base/src/TROOT.cxx:1675
#8  0x00007ffff74e8fa7 in Cppyy::GetMethodTemplate (scope=scope@entry=1, name="call_helper", proto="Helper<vector<double> >&") at /usr/include/c++/12.2.0/bits/basic_string.h:233
#9  0x00007ffff75752d7 in CPyCppyy::TemplateProxy::Instantiate (this=0x7fffd804ca40, fname="call_helper", args=0x7ffff7736bf0, pref=CPyCppyy::Utility::kReference, pcnt=0x7fffffffc4e0)
    at /home/b/bendavid/rootdev28/root/bindings/pyroot/cppyy/CPyCppyy/src/TemplateProxy.cxx:189
#10 0x00007ffff7576b2a in CPyCppyy::tpp_call (pytmpl=0x7fffd804ca40, args=0x7ffff7736bf0, kwds=0x7fffd8041780) at /usr/include/c++/12.2.0/bits/basic_string.tcc:238
#11 0x00007ffff7d48b4b in _PyObject_MakeTpCall () from /usr/lib/libpython3.10.so.1.0
#12 0x00007ffff7d43f06 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.10.so.1.0
#13 0x00007ffff7d3d760 in ?? () from /usr/lib/libpython3.10.so.1.0
#14 0x00007ffff7deae04 in PyEval_EvalCode () from /usr/lib/libpython3.10.so.1.0
#15 0x00007ffff7dfb5b3 in ?? () from /usr/lib/libpython3.10.so.1.0
#16 0x00007ffff7df6d0a in ?? () from /usr/lib/libpython3.10.so.1.0
#17 0x00007ffff7c9b1cd in ?? () from /usr/lib/libpython3.10.so.1.0
#18 0x00007ffff7c9ae7e in _PyRun_SimpleFileObject () from /usr/lib/libpython3.10.so.1.0
#19 0x00007ffff7c9b831 in _PyRun_AnyFileObject () from /usr/lib/libpython3.10.so.1.0
#20 0x00007ffff7e07d9d in Py_RunMain () from /usr/lib/libpython3.10.so.1.0
#21 0x00007ffff7ddc5eb in Py_BytesMain () from /usr/lib/libpython3.10.so.1.0
#22 0x00007ffff7a36290 in __libc_start_call_main (main=main@entry=0x555555555120, argc=argc@entry=2, argv=argv@entry=0x7fffffffcbb8) at ../sysdeps/nptl/libc_start_call_main.h:58
#23 0x00007ffff7a3634a in __libc_start_main_impl (main=0x555555555120, argc=2, argv=0x7fffffffcbb8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffcba8) at ../csu/libc-start.c:381
#24 0x0000555555555045 in _start ()
Axel-Naumann commented 1 year ago

Excellent work, @bendavid and apologies for not doing this myself, in time.

Would it be an option to simply repeat the lookup without the RAII in case of failure to find an overload? It's only in the error case, where performance isn't as crucial anyway, and simplifies the interplay between PyROOT/cppyy and cling.

bendavid commented 1 year ago

So the current error handling is a bit messed up.

What happens for my current test case is as follows: -cppyy tries to instantiate the template from here: https://github.com/root-project/root/blob/180c8099bfb6ad753b8a698445351662f8c5ef1f/bindings/pyroot/cppyy/CPyCppyy/src/TemplateProxy.cxx#L628 this call actually "succeeds" because the underlying instantiation call at https://github.com/root-project/root/blob/180c8099bfb6ad753b8a698445351662f8c5ef1f/interpreter/cling/lib/Interpreter/LookupHelper.cpp#L1211-L1212 does not use the DefinitionRequired=true argument. -As a result of this cppyy tries to call the function anyways at https://github.com/root-project/root/blob/180c8099bfb6ad753b8a698445351662f8c5ef1f/bindings/pyroot/cppyy/CPyCppyy/src/TemplateProxy.cxx#L632 which propagates through cling and eventually fails only because of the missing symbol, which is caught here: https://github.com/root-project/root/blob/180c8099bfb6ad753b8a698445351662f8c5ef1f/interpreter/cling/lib/Interpreter/IncrementalExecutor.cpp#L279-L280 (this is also the source of the "IncrementalExecutor::executeFunction: symbol unresolved while linking" output on stderr) -Finally cppyy catches the error only here, where the call to WrapperCall returns false: https://github.com/root-project/root/blob/180c8099bfb6ad753b8a698445351662f8c5ef1f/bindings/pyroot/cppyy/cppyy-backend/clingwrapper/src/clingwrapper.cxx#L812-L814 But for functions which return a basic type or a pointer, the default value is actually just propagated through and the failure is never caught downstream. (ie for a pointer you get nullptr, for a signed integer or floating point type you get -1, and for an unsigned integer you get the max value because it wraps...

To make matters worse, because the error is never caught in LookupHelper, the decl is never cleaned up, so if you try to lookup the function again, it will again spuriously succeed, but this time won't even emit the clang errors.

In terms of solutions, it would be best if the lookup fails in this case, but aside from that, probably cppyy should throw an exception rather than returning a default value at https://github.com/root-project/root/blob/180c8099bfb6ad753b8a698445351662f8c5ef1f/bindings/pyroot/cppyy/cppyy-backend/clingwrapper/src/clingwrapper.cxx#L814 (and in several of the related functions)

The exception would be caught here and should properly translate into a python exception: https://github.com/root-project/root/blob/180c8099bfb6ad753b8a698445351662f8c5ef1f/bindings/pyroot/cppyy/CPyCppyy/src/CPPMethod.cxx#L73-L75

As far as just failing the initial lookup is concerned (and properly cleaning up the decl), I tried using DefinitionRequired=true at https://github.com/root-project/root/blob/180c8099bfb6ad753b8a698445351662f8c5ef1f/interpreter/cling/lib/Interpreter/LookupHelper.cpp#L1211-L1212 but this creates other problems such that even the build of root fails...