pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.58k stars 17.9k forks source link

BUG: A trivial C++ module crashes if you import it after pandas #50994

Open kapacuk opened 1 year ago

kapacuk commented 1 year ago

Pandas version checks

Reproducible Example

cat >spam.cpp <<END
#include <Python.h>
#include <filesystem>
PyMODINIT_FUNC PyInit_spam(void)
{
    std::filesystem::path p1("/usr/share/icons");
    std::filesystem::path p2("hicolor");
    auto x = p1 / p2;
    return NULL;
}
END
g++-12 -I/usr/include/python3.9 -g -fPIC -std=c++20 -fsanitize=address -shared -o spam.cpython-39-x86_64-linux-gnu.so spam.cpp
python -c "import pandas; import spam"

Issue Description

The above example is a trivial python module with no classes or methods. It does not even try to create a PyModule object, so if you import it on its own it throws this exception: SystemError: initialization of spam failed without raising an exception which is expected.

However, if you import this module after pandas it crashes printing a single error message: munmap_chunk(): invalid pointer

I tried it with python3.9, 3.10, and 3.11, and with gcc-11 and gcc-12. It crashes consistently, although sometimes I get a different the error message.

Expected Behavior

That example should throw the SystemError exception regardless of whether you import it before or after pandas.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 2e218d10984e9919f0296931d92ea851c6a6faf5 python : 3.9.2.final.0 python-bits : 64 OS : Linux OS-release : 5.10.0-13-amd64 Version : #1 SMP Debian 5.10.106-1 (2022-03-17) machine : x86_64 processor : byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8 pandas : 1.5.3 numpy : 1.24.1 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 66.1.1 pip : 22.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None brotli : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : None snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None tzdata : None None
Jython1415 commented 1 year ago

I'm struggling to reproduce this, but I suspect it's due to problems on my end...

Would you mind elaborating on the other error you get sometimes, and under what conditions you get the other error message?

kapacuk commented 1 year ago

Would you mind elaborating on the other error you get sometimes, and under what conditions you get the other error message?

I've just checked, I still get the munmap_chunk(): invalid pointer error with gcc-12 and gcc-13, but with gcc-10 the error is different:

$ python -c "import pandas; import spam"
double free or corruption (out)
Aborted
$

Please let me know if you still have problems reproducing it, I can build a Docker container for you.