Stack overflow on joinon high cpu count machine

DylanZA commented 7 months ago

Checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

NIDS = 500
NAXISA = 1000
NVALS = 100
NAXISB = 5000

ids = pl.Series("id", range(NIDS), pl.Int64)
vals = pl.Series("val", range(NVALS), pl.Int64)
extraa = pl.Series("axis", range(NAXISA), pl.Int64)
extrab = pl.Series("axis", range(NAXISB), pl.Int64)

a = pl.DataFrame(ids).join(pl.DataFrame(extraa), how='cross').join(pl.DataFrame(vals), how='cross')
b = pl.DataFrame(ids).join(pl.DataFrame(extrab), how='cross')

print("a={:,} b={:,}".format(len(a),len(b)))
for i in range(30):
  print(i)
  a.join(b,on=['id','axis'])

Log output

join parallel: true
CROSS join dataframes finished
join parallel: true
CROSS join dataframes finished
join parallel: true
CROSS join dataframes finished
'a=50,000,000 b=2,500,000'
0
join parallel: true
INNER join dataframes finished
1
join parallel: true
INNER join dataframes finished
2
join parallel: true
INNER join dataframes finished
3
join parallel: true
INNER join dataframes finished
4
join parallel: true
INNER join dataframes finished
5
join parallel: true
INNER join dataframes finished
6
join parallel: true
INNER join dataframes finished
7
join parallel: true
INNER join dataframes finished
8
join parallel: true
Segmentation fault (core dumped)```

Issue description

Joining two biggish dataframes (50 million and 2.5 million) occasionally is stack overflowing somewhere deep in rayon.

Note this seems to happen only on my machine with 128 logical cores, and not my smaller machine.

Expected behavior

It does not crash

Installed versions

``` --------Version info--------- Polars: 0.20.10 Index type: UInt32 Platform: Linux-6.5.0-14-generic-x86_64-with-glibc2.35 Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fsspec: gevent: hvplot: matplotlib: 3.8.1 numpy: 1.26.1 openpyxl: pandas: 2.1.2 pyarrow: 14.0.0 pydantic: pyiceberg: pyxlsb: sqlalchemy: xlsx2csv: xlsxwriter: ```

ritchie46 commented 7 months ago

Hmm.. Could you see in GDB where you are at that point? Maybe make a debug compilation. (I am on vacation and have poor internet, let alone a 128 core machine. :))

DylanZA commented 7 months ago

Hmm.. Could you see in GDB where you are at that point? Maybe make a debug compilation. (I am on vacation and have poor internet, let alone a 128 core machine. :))

Here is a full gdb thread backtrace. compressed as it's 15MB uncompressed

gdb.txt.gz

DylanZA commented 6 months ago

fwiw you can replicate the behaviour by setting RUST_MIN_STACK. On my 16 core machine I need to use 100000

pola-rs / polars