pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
26.63k stars 1.63k forks source link

Importing pyarrow after polars causes `SIGSEGV` #16028

Closed cgbur closed 1 week ago

cgbur commented 2 weeks ago

Checks

Reproducible example

import polars as pl
import pyarrow

Log output

fish: Job 1, 'python3 test.py' terminated by signal SIGSEGV (Address boundary error)

Issue description

When this import order is used, I am getting an address boundary error. However, if I switch the import order and do pyro first, I do not have this error anymore.

I'm using Python version 3.12.

Expected behavior

Should not error?

Installed versions

``` --------Version info--------- Polars: 0.20.15 Index type: UInt32 Platform: Linux-5.10.215-181.850.amzn2int.x86_64-x86_64-with-glibc2.39 Python: 3.12.2 (main, Feb 6 2024, 20:19:44) [GCC 13.2.0] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fastexcel: fsspec: gevent: hvplot: matplotlib: 3.8.3 numpy: 1.26.4 openpyxl: pandas: 2.2.1 pyarrow: 15.0.0 pydantic: pyiceberg: pyxlsb: sqlalchemy: xlsx2csv: xlsxwriter: None```
ritchie46 commented 2 weeks ago

I cannot reproduce. What does valgrind say? And have you tried with latest Polars?

deanm0000 commented 2 weeks ago

I just downgraded to polars==0.20.15 and pyarrow==15.0.0 and still I can't reproduce although this env has python 3.11.9

--------Version info---------
Polars:               0.20.15
Index type:           UInt32
Platform:             Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python:               3.11.9 (main, Apr  6 2024, 17:59:24) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           0.3.2
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.4
numpy:                1.26.4
openpyxl:             <not installed>
pandas:               2.2.2
pyarrow:              15.0.0
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           <not installed>
xlsx2csv:             <not installed>
xlsxwriter:           <not installed>
cgbur commented 2 weeks ago

Ok thanks for checking in on this, I may have something far more wrong with my system then.

❯ python3
Python 3.12.3 (main, Apr  9 2024, 08:09:14) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import polars as pl
>>> pl.DataFrame({"a": [1,2,3], "b":[4,5,6]})
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 2   ┆ 5   │
│ 3   ┆ 6   │
└─────┴─────┘
>>> pl.show_versions
<function show_versions at 0x7f343ed68cc0>
>>> pl.show_versions()
fish: Job 1, 'python3' terminated by signal SIGSEGV (Address boundary error)

I went ahead and upgraded my versions and same issue. Seemingly works fine from an jupyter notebook however. I fear something is really wrong with my system. Ill go ahead and close this.

cgbur commented 2 weeks ago

I have narrowed it down to, in my Nix configuration, when I have PyArrow enabled, the error happens as soon as I try to use Polars for certain activities. However, what is odd is I do not get any of these errors when running in a Jupyter notebook.

(python312.withPackages (ppkgs: with ppkgs; [
      polars
      pyarrow
      numpy
      pandas
      scipy
      matplotlib
      seaborn
      boto3
      tqdm
      pyyaml
      requests
      ipython
      ipykernel
      humanize
    ]))

Is there a certain configuration i need to pass to valgrind to get this to report something useful?

❯ valgrind --leak-check=yes python3 -c "import polars as pl; pl.show_versions()"
==39677== Memcheck, a memory error detector
==39677== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==39677== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==39677== Command: python3 -c import\ polars\ as\ pl;\ pl.show_versions()
==39677== 
fish: Job 1, 'valgrind --leak-check=yes pytho…' terminated by signal SIGSEGV (Address boundary error)