wlav / cppyy

Other
387 stars 39 forks source link

Error related to MPI and the default constructor when debug mode is enabled #163

Open keceli opened 1 year ago

keceli commented 1 year ago

I see an error that appears only if I enable the debug mode in cppyy.

import cppyy
cppyy.set_debug()
from parallelzone import parallelzone as pz
comm = pz.runtime.RuntimeView()

The error is:

/usr/lib/x86_64-linux-gnu/openmpi/include/mpi.h:419:9: error: incomplete type 'ompi_communicator_t' named in nested name specifier
typedef struct ompi_communicator_t *MPI_Comm;
        ^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/x86_64-linux-gnu/openmpi/include/mpi.h:419:16: note: forward declaration of 'ompi_communicator_t'
typedef struct ompi_communicator_t *MPI_Comm;
               ^
Traceback (most recent call last):
  File "/home/keceli/soft/nwx/cli_nux/tests/debug.py", line 14, in <module>
    comm = pz.runtime.RuntimeView()
TypeError: none of the 7 overloaded methods succeeded. Full details:
  RuntimeView::RuntimeView(parallelzone::runtime::RuntimeView&& other) =>
    TypeError: takes at least 1 arguments (0 given)
  RuntimeView::RuntimeView(parallelzone::runtime::RuntimeView::pimpl_pointer pimpl) =>
    TypeError: takes at least 1 arguments (0 given)
  parallelzone::runtime::RuntimeView constructor failed
  RuntimeView::RuntimeView(parallelzone::runtime::RuntimeView::argc_type argc, parallelzone::runtime::RuntimeView::argv_type argv) =>
    TypeError: takes at least 2 arguments (0 given)
  RuntimeView::RuntimeView(const parallelzone::runtime::RuntimeView& other) =>
    TypeError: takes at least 1 arguments (0 given)
  RuntimeView::RuntimeView(parallelzone::runtime::RuntimeView::mpi_comm_type comm) =>
    TypeError: takes at least 1 arguments (0 given)
  RuntimeView::RuntimeView(parallelzone::runtime::RuntimeView::argc_type argc, parallelzone::runtime::RuntimeView::argv_type argv, parallelzone::runtime::RuntimeView::mpi_comm_type comm) =>
    TypeError: takes at least 3 arguments (0 given)

Looks like cppyy doesn't recognize the default constructor that works without any arguments. Interestingly the code runs if I comment out cppyy.set_debug(). ParallelZone is part of NWChemEx and the repo is located here. Any suggestions?

wlav commented 1 year ago

I can reproduce it, but haven't been able to formulate a hypothesis as to why the behavior is such. My best guess is that placing set_debug() also disables switching off diagnostics, some code deep in Cling that is normally allowed to fail gracefully in normal running now propagates that error. Another guess is that the problem could come from diagnostics printing.

Maybe start with what you need set_debug() for?

wlav commented 1 year ago

Found the source ... it's a piece of code that explicitly checks whether classes are complete (and thus can be bound). Still not sure how the debugging level plays into it (or why with debugging on, an error propagates), but it is indeed something that is allowed to (silently) fail during normal running.

keceli commented 1 year ago

Thank you @wlav for looking into this problem.

Maybe start with what you need set_debug() for?

I came across this issue while debugging an unrelated bug in my code. I used set_debug since the error message was not helpful. Anyhow, that problem is fixed, so I don't need to turn on the debug mode until the next bug :) I just wanted to report the issue to see if I am missing smt or if there is an easy fix.