iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.86k stars 625 forks source link

Suspected MSVC bug in python interop #14371

Open stellaraccident opened 1 year ago

stellaraccident commented 1 year ago

Leaving here as a tombstone. I have disabled the compiler api_test.py on Windows and need to fix it.

Something in here is causing a word to be corrupted, which always seems to be the high (memory order) word of the MlirContext.ptr. It makes no sense. I can step through in the debugger, and all looks fine until stepping into an inlined getContext() function (can be triggered through multiple paths). Then the value flips with the low order 32bits of the pointer being corrupted. Nothing else is going on and significant bisecting has been done to see that it is a problem at this point. I suspect a compiler bug.

py::object PyMlirContext::createFromCapsule(py::object capsule) {
  MlirContext rawContext = mlirPythonCapsuleToContext(capsule.ptr());
  if (mlirContextIsNull(rawContext))
    throw py::error_already_set();
  return forContext(rawContext).releaseObject();
}

Also, during the investigation, it was noted that this method is leaking MLIRContext references:

PyMlirContextRef PyMlirContext::forContext(MlirContext context) {
  py::gil_scoped_acquire acquire;
  auto &liveContexts = getLiveContexts();
  auto it = liveContexts.find(context.ptr);
  if (it == liveContexts.end()) {
    // Create.
    PyMlirContext *unownedContextWrapper = new PyMlirContext(context);
    py::object pyRef = py::cast(unownedContextWrapper);
    assert(pyRef && "cast to py::object failed");
    liveContexts[context.ptr] = unownedContextWrapper;
    return PyMlirContextRef(unownedContextWrapper, std::move(pyRef));
  }
  // Use existing.
  py::object pyRef = py::cast(it->second);
  return PyMlirContextRef(it->second, std::move(pyRef));
}

The py::cast needs a return value policy to take ownership.

allieculp commented 1 year ago

Tombstone hmmmm, P2..?

stellaraccident commented 1 year ago

No, pretty sure it is broken -- just left this here to pick up.

jpienaar commented 1 year ago

Which version of MSVC? (pretty sure we are hitting at least one other one there https://github.com/google/jax/issues/16394 but that's with bazel too, so may be completely different)