Open Elvynzs opened 3 days ago
Also works in 1.9.0. I think we need a bisect on this one.
@ritchie46 Seems to be the upgrade in PyO3 https://github.com/pola-rs/polars/pull/19199
@itamarst can you take look what happened here? Thanks for the bisect @cmdlineluser
Will try to take a look, yes.
The issue as explained by Ritchie is that older PyO3 would defer decrefs on Python objects (in this case polars_python::conversion::ObjectValue
) to a pool, and then drop them later.
In latest version, there is no pool, so current code will acquire GIL... which then deadlocks with code holding GIL that is calling into Rayon.
Ritchie's thought was to restore usage of the pool to Polars.
My proposal: anything calling Rayon should release the GIL. And in fact adding Python::allow_threads()
to the key method in this case (PyDataFrame::gather_with_series()
in dataframe/general.rs in polars-python) fixes the deadlock.
To expand: Almost all methods of PyDataFrame
and PySeries
etc should plausibly be releasing the GIL, and currently aren't. E.g. all the methods that are just passthroughs that call self.df.dosomething()
. Presumably Polars already has concurrency controls on that level so the GIL part is unnecessary. This could perhaps be made less verbose with a macro.
Checks
Reproducible example
Log output
Issue description
See example. Used to work in 1.8.1.
This test comes from our unittests, where we try to be robust for different kind of inputs that may not always be well structured.
Expected behavior
Here is what I used to get in 1.8.1 :
Installed versions