pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.62k stars 17.57k forks source link

BUILD: Pandas 1.2.5 build no longer works #59038

Closed adrianonobre closed 2 weeks ago

adrianonobre commented 2 weeks ago

Installation check

Platform

macOS-14.5-arm64-arm-64bit

Installation Method

pip install

pandas Version

1.2.5

Python Version

Python 3.11.7

Installation Logs

pip install --no-cache six==1.16.0
pip install --no-cache pytz==2024.1
pip install --no-cache python-dateutil==2.9.0.post0
pip install --no-cache numpy==1.26.4
pip install --no-cache cython==0.29.21
# blows up
pip install --no-cache pandas==1.2.5 
clang -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DNPY_NO_DEPRECATED_API=0 -Ipandas/_libs/src/ujson/python -Ipandas/_libs/src/ujson/lib -Ipandas/_libs/src/datetime -I/private/var/folders/gx/t9hqn79x4wdgbqy8n09y0bww0000gp/T/pip-build-env-74z_mupr/overlay/lib/python3.11/site-packages/numpy/_core/include -I/Users/adriano.nobre/dev/test/sample-venv/include -I/Users/adriano.nobre/.pyenv/versions/3.11.7/include/python3.11 -c pandas/_libs/src/ujson/lib/ultrajsonenc.c -o build/temp.macosx-13.6-arm64-cpython-311/pandas/_libs/src/ujson/lib/ultrajsonenc.o -D_GNU_SOURCE -Wno-error=unreachable-code clang -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -DNPY_NO_DEPRECATED_API=0 -Ipandas/_libs/src/ujson/python -Ipandas/_libs/src/ujson/lib -Ipandas/_libs/src/datetime -I/private/var/folders/gx/t9hqn79x4wdgbqy8n09y0bww0000gp/T/pip-build-env-74z_mupr/overlay/lib/python3.11/site-packages/numpy/_core/include -I/Users/adriano.nobre/dev/test/sample-venv/include -I/Users/adriano.nobre/.pyenv/versions/3.11.7/include/python3.11 -c pandas/_libs/src/ujson/python/JSONtoObj.c -o build/temp.macosx-13.6-arm64-cpython-311/pandas/_libs/src/ujson/python/JSONtoObj.o -D_GNU_SOURCE -Wno-error=unreachable-code pandas/_libs/src/ujson/python/JSONtoObj.c:195:49: warning: incompatible pointer types passing 'PyObject *' (aka 'struct _object *') to parameter of type 'const PyArrayObject *' (aka 'const struct tagPyArrayObject_fields *') [-Wincompatible-pointer-types] new_data = PyDataMem_RENEW(PyArray_DATA(ret), i * npyarr->elsize); ^~~ /private/var/folders/gx/t9hqn79x4wdgbqy8n09y0bww0000gp/T/pip-build-env-74z_mupr/overlay/lib/python3.11/site-packages/numpy/_core/include/numpy/ndarraytypes.h:1508:35: note: passing argument to parameter 'arr' here PyArray_DATA(const PyArrayObject *arr) ^ pandas/_libs/src/ujson/python/JSONtoObj.c:260:33: error: no member named 'elsize' in 'struct _PyArray_Descr' npyarr->elsize = dtype->elsize; ~~~~~ ^ pandas/_libs/src/ujson/python/JSONtoObj.c:305:53: warning: incompatible pointer types passing 'PyObject *' (aka 'struct _object *') to parameter of type 'const PyArrayObject *' (aka 'const struct tagPyArrayObject_fields *') [-Wincompatible-pointer-types] new_data = PyDataMem_RENEW(PyArray_DATA(npyarr->ret), ^~~~~~~~~~~ /private/var/folders/gx/t9hqn79x4wdgbqy8n09y0bww0000gp/T/pip-build-env-74z_mupr/overlay/lib/python3.11/site-packages/numpy/_core/include/numpy/ndarraytypes.h:1508:35: note: passing argument to parameter 'arr' here PyArray_DATA(const PyArrayObject *arr) ^ pandas/_libs/src/ujson/python/JSONtoObj.c:316:18: warning: incompatible pointer types passing 'PyObject *' (aka 'struct _object *') to parameter of type 'const PyArrayObject *' (aka 'const struct tagPyArrayObject_fields *') [-Wincompatible-pointer-types] PyArray_DIMS(npyarr->ret)[0] = i + 1; ^~~~~~~~~~~ /private/var/folders/gx/t9hqn79x4wdgbqy8n09y0bww0000gp/T/pip-build-env-74z_mupr/overlay/lib/python3.11/site-packages/numpy/_core/include/numpy/ndarraytypes.h:1520:35: note: passing argument to parameter 'arr' here PyArray_DIMS(const PyArrayObject *arr) ^ pandas/_libs/src/ujson/python/JSONtoObj.c:318:33: warning: incompatible pointer types passing 'PyObject *' (aka 'struct _object *') to parameter of type 'const PyArrayObject *' (aka 'const struct tagPyArrayObject_fields *') [-Wincompatible-pointer-types] if ((item = PyArray_GETPTR1(npyarr->ret, i)) == NULL || ^~~~~~~~~~~ /private/var/folders/gx/t9hqn79x4wdgbqy8n09y0bww0000gp/T/pip-build-env-74z_mupr/overlay/lib/python3.11/site-packages/numpy/_core/include/numpy/ndarrayobject.h:138:57: note: expanded from macro 'PyArray_GETPTR1' #define PyArray_GETPTR1(obj, i) ((void *)(PyArray_BYTES(obj) + \ ^~~ /private/var/folders/gx/t9hqn79x4wdgbqy8n09y0bww0000gp/T/pip-build-env-74z_mupr/overlay/lib/python3.11/site-packages/numpy/_core/include/numpy/ndarraytypes.h:1514:36: note: passing argument to parameter 'arr' here PyArray_BYTES(const PyArrayObject *arr) ^ pandas/_libs/src/ujson/python/JSONtoObj.c:318:33: warning: incompatible pointer types passing 'PyObject *' (aka 'struct _object *') to parameter of type 'const PyArrayObject *' (aka 'const struct tagPyArrayObject_fields *') [-Wincompatible-pointer-types] if ((item = PyArray_GETPTR1(npyarr->ret, i)) == NULL || ^~~~~~~~~~~ /private/var/folders/gx/t9hqn79x4wdgbqy8n09y0bww0000gp/T/pip-build-env-74z_mupr/overlay/lib/python3.11/site-packages/numpy/_core/include/numpy/ndarrayobject.h:139:62: note: expanded from macro 'PyArray_GETPTR1' (i)*PyArray_STRIDES(obj)[0])) ^~~ /private/var/folders/gx/t9hqn79x4wdgbqy8n09y0bww0000gp/T/pip-build-env-74z_mupr/overlay/lib/python3.11/site-packages/numpy/_core/include/numpy/ndarraytypes.h:1526:38: note: passing argument to parameter 'arr' here PyArray_STRIDES(const PyArrayObject *arr) ^ pandas/_libs/src/ujson/python/JSONtoObj.c:319:25: warning: incompatible pointer types passing 'PyObject *' (aka 'struct _object *') to parameter of type 'PyArrayObject *' (aka 'struct tagPyArrayObject_fields *') [-Wincompatible-pointer-types] PyArray_SETITEM(npyarr->ret, item, value) == -1) { ^~~~~~~~~~~ /private/var/folders/gx/t9hqn79x4wdgbqy8n09y0bww0000gp/T/pip-build-env-74z_mupr/overlay/lib/python3.11/site-packages/numpy/_core/include/numpy/ndarrayobject.h:292:32: note: passing argument to parameter 'arr' here PyArray_SETITEM(PyArrayObject *arr, char *itemptr, PyObject *v) ^ 6 warnings and 1 error generated. error: command '/usr/bin/clang' failed with exit code 1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for pandas Failed to build pandas ERROR: Could not build wheels for pandas, which is required to install pyproject.toml-based projects
adrianonobre commented 2 weeks ago

potentially caused by this: https://github.com/numpy/numpy/commit/b3f9fc0ffed50ac437a2f09ecffeb6709a2487c8

myriky commented 2 weeks ago

same issue

lithomas1 commented 2 weeks ago

We are not planning on adding numpy 2 support to pandas 1.2.5 since numpy 2 compat is a pretty involved process and pandas 1.2.5 is already 3 years old at this point.

Please upgrade to pandas 2.2.2 as that is the first pandas version to support numpy 2.

adrianonobre commented 2 weeks ago

Hi @lithomas1, I think there's a misunderstanding (I'll take numpy 2.0 mention from the title) I'm not trying to use numpy 2. The issue is rather that pandas 1.2.5 build no longer works (with numpy 1.26, see notes in the "installation logs" section).

I was just pointing out that a change done for numpy 2 seems to have bled into numpy 1.26 as a breaking change, potentially affecting projects that depend on numpy 1.26

Thanks for looking into this

lithomas1 commented 2 weeks ago

Sorry for the misunderstanding.

Did you mention that this started failing after numpy 2.0 was released? (If so, this might because the build of pandas 2.0 is pulling the newest numpy, not the one you have installed)

adrianonobre commented 2 weeks ago

No worries! (and thanks again for your time, @lithomas1 )

Correct. We noticed the pandas build started failing yesterday "out of the blue". We've got dependencies versions pinned in our requirements file. We didn't make any changes to these versions. Here's a couple of them: numpy==1.26.4 pandas==1.2.5

Investigating a bit we noticed that the NumPy project did a release release this week (2.0) and we found this change which seems to align with the error message we're getting in the pandas build (i.e. a missing struct attribute elsize):

      pandas/_libs/src/ujson/python/JSONtoObj.c:260:33: error: no member named 'elsize' in 'struct _PyArray_Descr'
              npyarr->elsize = dtype->elsize;
                               ~~~~~  ^

The easiest way a colleague found to repro this is as follows:

# make virtual env (make sure python is 3.11)
python -m venv ./sample-venv
# activate virtual env
. ./sample-venv/bin/activate
# install same packages we do in the project (relevant to pandas/numpy)
pip install --no-cache six==1.16.0
pip install --no-cache pytz==2024.1
pip install --no-cache python-dateutil==2.9.0.post0
pip install --no-cache numpy==1.26.4
pip install --no-cache cython==0.29.21
# blows up
pip install --no-cache pandas==1.2.5 
lithomas1 commented 2 weeks ago

Can you try installing pandas with --no-build-isolation?

(You'll need all dependencies pre-installed, but this should force pip to use your numpy, and not pull its own numpy)

mttr commented 2 weeks ago

For what it's worth, this problem seems to be due to this line in pyproject.toml and can be fixed by changing it to "numpy<2; python_version>='3.9'"

lithomas1 commented 2 weeks ago

Yeah, this is maybe something to consider in the future, but for now the --no-build-isolation step should fix it.

adrianonobre commented 2 weeks ago

Can you try installing pandas with --no-build-isolation

NEWEST EDIT:

Workaround: We were able to get it working by using "--no-build-isolation" + bumping a cython version to 0.29.37. So:

# make virtual env (make sure python is 3.11)
python -m venv ./sample-venv
# activate virtual env
. ./sample-venv/bin/activate
# install same packages we do in the project (relevant to pandas/numpy)
pip install --no-cache six==1.16.0
pip install --no-cache pytz==2024.1
pip install --no-cache python-dateutil==2.9.0.post0
pip install --no-cache numpy==1.26.4
pip install --no-cache cython==0.29.37  <--------------- IMPORTANT: version
pip install --no-cache pandas==1.2.5 --no-build-isolation <------- IMPORTANT: no-build-isolation flag
# WORKS NOW!

OLD:

I tried pip install pandas==1.2.5 --no-build-isolation and got a different error:

(EDIT: fwiw I got Python 3.11 when I got the the error below, someone reported that it worked while they were using Python 3.9)

      pandas/_libs/algos.c:235:12: fatal error: 'longintrepr.h' file not found
        #include "longintrepr.h"
                 ^~~~~~~~~~~~~~~
lithomas1 commented 2 weeks ago

Can you try upgrading your Cython?

This looks like https://github.com/aio-libs/aiohttp/issues/6600, which someone reports was fixed in Cython 0.29.5

adrianonobre commented 2 weeks ago

Thanks for your help @lithomas1 (and @mttr ) ! 🙏

Sandhya-J commented 2 weeks ago

Hi, @lithomas1 and @mttr,

I am facing the same issue - pandas build has started to fail since couple of days. Tried the solution mentioned above, however using the--no-build-isolation is throwing me the error as ModuleNotFoundError: No module named 'numpy'

Providing some more context to our problem here: We are trying to build pandas 1.2.4 with python 3.11 Locked the versions as below

cython==0.29.37
numpy==1.26.4
pandas==1.2.4

Installing it inside a docker as pip install --no-build-isolation $REQUIREMENTS where $REQUIREMENTS is the path to the requirements.txt file which contains the name of our internal package where we are using pandas and numpy.

Any help is much appreciated. Please advise.

lithomas1 commented 1 week ago

Can you try a newer pandas?

pandas 1.2.4 does not officially support Python 3.11 IIRC, so if something non-trivial is going wrong, I can't help you too much (as the version of pandas you are using is very old).

pandas 1.5 should have official wheels for Python 3.11 (and also should be API compatible with pandas 1.2.4)