pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.95k stars 1.93k forks source link

Regression in 1.10.0 : Selection of rows hangs indefinitely for a frame contaning a `pl.Object` series #19358

Open Elvynzs opened 3 days ago

Elvynzs commented 3 days ago

Checks

Reproducible example

import polars as pl
pl.Series(["text"] * 100 + [1] * 100, dtype=pl.Object)[[0, -1]] # OK !
pl.Series(["text"] * 100 + [1] * 100, dtype=pl.Object).to_frame()[[0, -1]] # Hangs indefinitely

Log output

N/A

Issue description

See example. Used to work in 1.8.1.

This test comes from our unittests, where we try to be robust for different kind of inputs that may not always be well structured.

Expected behavior

Here is what I used to get in 1.8.1 :

object
------
text
1

Installed versions

``` --------Version info--------- Polars: 1.10.0 Index type: UInt32 Platform: Windows-10-10.0.22000-SP0 Python: 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:40:50) [MSC v.1937 64 bit (AMD64)] LTS CPU: False ----Optional dependencies---- adbc_driver_manager altair 5.3.0 cloudpickle connectorx deltalake fastexcel fsspec gevent great_tables matplotlib 3.8.4 nest_asyncio 1.6.0 numpy 1.26.4 openpyxl 3.1.2 pandas 2.2.2 pyarrow 15.0.2 pydantic pyiceberg sqlalchemy torch 2.4.0 xlsx2csv xlsxwriter ```
ritchie46 commented 3 days ago

Also works in 1.9.0. I think we need a bisect on this one.

cmdlineluser commented 3 days ago

@ritchie46 Seems to be the upgrade in PyO3 https://github.com/pola-rs/polars/pull/19199

ritchie46 commented 2 days ago

@itamarst can you take look what happened here? Thanks for the bisect @cmdlineluser

itamarst commented 2 days ago

Will try to take a look, yes.

itamarst commented 2 days ago

The issue as explained by Ritchie is that older PyO3 would defer decrefs on Python objects (in this case polars_python::conversion::ObjectValue) to a pool, and then drop them later.

In latest version, there is no pool, so current code will acquire GIL... which then deadlocks with code holding GIL that is calling into Rayon.

Ritchie's thought was to restore usage of the pool to Polars.

My proposal: anything calling Rayon should release the GIL. And in fact adding Python::allow_threads() to the key method in this case (PyDataFrame::gather_with_series() in dataframe/general.rs in polars-python) fixes the deadlock.

itamarst commented 2 days ago

To expand: Almost all methods of PyDataFrame and PySeries etc should plausibly be releasing the GIL, and currently aren't. E.g. all the methods that are just passthroughs that call self.df.dosomething(). Presumably Polars already has concurrency controls on that level so the GIL part is unnecessary. This could perhaps be made less verbose with a macro.