Open setu4993 opened 1 month ago
I think we're running into the same issue in this run at widgetti/solara. We also see an error on windows: Windows fatal exception: access violation
resulting in a seg fault.
The relevant part of the logs:
self = <vaex.hash.HashMapUnique object at 0x7fe128022ee0>
def flatten(self):
if self.dtype == object:
return self # already flat
> keys = self._internal.key_array()
E RuntimeError: pybind11::handle::inc_ref() PyGILState_Check() failure.
/opt/hostedtoolcache/Python/3.9.20/x64/lib/python3.9/site-packages/vaex/hash.py:78: RuntimeError
----------------------------- Captured stderr call -----------------------------
pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
The failing pybind11::handle::inc_ref() call was triggered on a numpy.ndarray object.
I think this is a bug in vaex-core which only gets exposed now due to upgrading pybind11. We should write a test to expose this, and see if we can reproduce this in CI. I think we should yank the vaex-core release to avoid other people hitting this.
+1, yanking seems like a good idea if this is arising from vaex-core
.
it's yanked
I could not reproduce this on a local ubuntu machine with Python 3.11, but i'll continue to see if I can reproduce it (help is welcome)
I googled RuntimeError: pybind11::handle::inc_ref() PyGILState_Check() failure
and blamed it onto the following release from 2022 (3rd bullet point):
https://github.com/pybind/pybind11/releases/tag/v2.10.2
looks like the end-user can disable these checks using environment variables although it does look like a valid point made in the PR description
Update:
We do not build with -DNDEBUG
when we use cibuildwheel, but in a regular build we do pass this to the compiler. This causes our release to be shipped with this check, while in our CI we do not have this check.
This is the exact opposite of what we want.
I'm not sure why this is, and who sets this (layers of abstraction are not always great...)
Thank you!
Description
All our unit tests and validations which worked fine on v4.17.1 started failing upon updating to v4.18.0. Curiously, this is occurring only on GitHub Actions so far (
ubuntu-latest
images) and not reproducible on M-series MacBooks. I suspect this is limited to some instruction sets.I don't have a great sample right now (but can get one tomorrow) to reproduce this, but the CI run here might be helpful to compare the diff. This is occurring across repos but this public one was the easiest to link to.
Software information
import vaex; vaex.__version__)
: 4.18.0pip
ubuntu-latest