pybind / pybind11

Seamless operability between C++11 and Python
https://pybind11.readthedocs.io/
Other
15.08k stars 2.05k forks source link

[BUG]: OpenMP loops hang with Python virtual function overrides #5117

Open jmikeowen opened 2 months ago

jmikeowen commented 2 months ago

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

9b4f71d1

Problem description

I'm having a problem with pybind11 wrapped code using virtual function overrides in OpenMP sections of C++ code. An OpenMP loop will hang when run with more than one thread if there are calls to Python overrides in the body of the code. Since the macro PYBIND11_OVERRIDE acquires the GIL I thought these sorts of operations should be safe, but clearly I'm missing something.

I've looked through the issues and discussions, but I don't see the solution for this problem. I've raised this as a Discussion at https://github.com/pybind/pybind11/discussions/5102 if that's a more appropriate place to examine this problem (like I'm making a mistake I don't see, certainly possible).

The C++ code block below will generate the problem when run with the Python code that follows.

Reproducible example code

#include "pybind11/pybind11.h"
#include "pybind11/functional.h"

namespace py = pybind11;
using namespace pybind11::literals;

#include <cstdio>

class A {
public:
  A()                               { printf("A::A()\n"); }    
  virtual ~A()                      { printf("A::~A()\n"); }   
  virtual void void_func() const    { printf("A::void_func()\n"); }
  virtual int int_func(int x) const { printf("A::int_func(%d)\n", x); return x + 1; }
};

void do_threaded_stuff(const A& a) {
  int sum = 0;
#pragma omp parallel for
  for (auto i = 0u; i < 10u; ++i) {
    a.void_func();
#pragma omp critical
    {
      sum = a.int_func(sum);
    }
  }
  printf("Final sum: %d\n", sum);
}

//------------------------------------------------------------------------------
// Trampoline class for A
//------------------------------------------------------------------------------
class PYB11TrampolineA: public A {
public:
  using A::A;
  virtual void void_func() const override {
    PYBIND11_OVERRIDE(void, A, void_func);
  }
  virtual int int_func(int x) const override {
    PYBIND11_OVERRIDE(int, A, int_func, x);
  }
};

//------------------------------------------------------------------------------
// Make the module
//------------------------------------------------------------------------------
PYBIND11_MODULE(virtual_override_thread, m) {
  py::class_<A, PYB11TrampolineA> obj(m, "A");
  obj.def(py::init<>());
  obj.def("void_func", (void (A::*)() const) &A::void_func);
  obj.def("int_func", (int (A::*)(int) const) &A::int_func);

  m.def("do_threaded_stuff", (void (*)(const A&)) &do_threaded_stuff, "a"_a);
}

# Python reproducer
from virtual_override_thread import *

class B(A):
    def __init__(self):
        A.__init__(self)

    def void_func(self):
        print("B::void_func")

    def int_func(self, x):
        print("B::int_func({})".format(x))
        return x + 10

a = A()
do_threaded_stuff(a)    # OK

b = B()
do_threaded_stuff(b)    # Hang with OMP_NUM_THREADS > 1

Is this a regression? Put the last known working version here if it is.

Not a regression