Open scinteeb opened 1 year ago
Hey all! any feedback will be appreciated. Thanks
Hi scinteeb,
I am facing exact same issue with Embedded Linux system build by Yocto 4,0 (kirkstone) running on armv7 32bit target. I have setup test environment on RaspberryPi 4 32bit.
My test is the spearman correlation test see also:
When installing 64bit it works fine. When using 32bit it fails.
OS details:
root@raspberrypi4:~# uname -a
Linux raspberrypi4 5.15.34-v7l #1 SMP Tue Apr 19 19:21:26 UTC 2022 armv7l GNU/Linux
root@raspberrypi4:~# python3 --version
Python 3.10.13
root@raspberrypi4:~# ls -lsa /usr/lib/liblapack.so.3
/usr/lib/liblapack.so.3 -> liblapack.so.3.10.0
root@raspberrypi4:~# python3
Python 3.10.13 (main, Aug 24 2023, 12:59:26) [GCC 11.4.0] on linux
>>> import numpy as np
>>> np.__version__
'1.22.3'
>>> import scipy as sc
>>> sc.__version__
'1.8.1'
>>> import pandas as pd
>>> pd.__version__
'1.4.2'
import pandas as pd
d = pd.DataFrame([1.0, 2.0])
d.corr(method='spearman')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.10/site-packages/pandas/core/frame.py", line 1011, in __repr__
return self.to_string(**repr_params)
File "/usr/lib/python3.10/site-packages/pandas/core/frame.py", line 1192, in to_string
return fmt.DataFrameRenderer(formatter).to_string(
File "/usr/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1128, in to_string
string = string_formatter.to_string()
File "/usr/lib/python3.10/site-packages/pandas/io/formats/string.py", line 25, in to_string
text = self._get_string_representation()
File "/usr/lib/python3.10/site-packages/pandas/io/formats/string.py", line 40, in _get_string_representation
strcols = self._get_strcols()
File "/usr/lib/python3.10/site-packages/pandas/io/formats/string.py", line 31, in _get_strcols
strcols = self.fmt.get_strcols()
File "/usr/lib/python3.10/site-packages/pandas/io/formats/format.py", line 611, in get_strcols
strcols = self._get_strcols_without_index()
File "/usr/lib/python3.10/site-packages/pandas/io/formats/format.py", line 864, in _get_strcols_without_index
str_columns = self._get_formatted_column_labels(self.tr_frame)
File "/usr/lib/python3.10/site-packages/pandas/io/formats/format.py", line 943, in _get_formatted_column_labels
dtypes = self.frame.dtypes
File "/usr/lib/python3.10/site-packages/pandas/core/generic.py", line 5746, in dtypes
data = self._mgr.get_dtypes()
File "/usr/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 228, in get_dtypes
return dtypes.take(self.blknos)
File "/usr/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 168, in blknos
self._rebuild_blknos_and_blklocs()
File "pandas/_libs/internals.pyx", line 711, in pandas._libs.internals.BlockManager._rebuild_blknos_and_blklocs
ValueError: Buffer dtype mismatch, expected 'intp_t' but got 'long long'
>>> d = pd.DataFrame([1.0, 2.0])
>>> d.corr(method='spearman')
0
0 1.0
I will try to dive further into this issue ... and let you know about any news... As I am quit new to pandas project I appreciate any help Michael
Hello Michael,
My target device is an embedded unit with scarce storage resources. That's why I cannot do much testing on it. What I did was to create a virtual ARMv7 32bit system using qemu and in there I did the following experiment:
hello Bogdan, I have the same problem as you, and I would like to consult with you. Has your problem been resolved? How was it ultimately resolved? Thank you very much for seeing my message and providing assistance. Thank you. happy everyday for you...
My records are as follows:
1.errors: Traceback (most recent call last): File "pyabcs", line 952, in _thread_loop if func(loopcnt=cnt, looptime=t - t0, loopintv=t - tlst) if fargs else func(): File "mdc_equip", line 140, in loop_mdcboard self.mdc_board() File "npm_tpj", line 450, in mdc_board if self.parse_data(data_html, lane, folder): File "npm_tpj", line 589, in parse_data platform_p = platform_pickup[platform_pickup['Machine Order']==str(seq)].sum() File "generic", line 10709, in sum self, axis, skipna, level, numeric_only, min_count, kwargs File "generic", line 10447, in sum "sum", nanops.nansum, axis, skipna, level, numeric_only, min_count, kwargs File "generic", line 10434, in _min_count_stat_function min_count=min_count, File "frame", line 9852, in reduce res, = df._mgr.reduce(blk_func, ignore_failures=ignore_failures) File "managers", line 1290, in reduce new_mgr = self._combine(res_blocks, copy=False, index=index) File "managers", line 555, in _combine inv_indexer = lib.get_reverse_indexer(indexer, self.shape[0]) File "lib", line 484, in pandas._libs.lib.get_reverse_indexer ValueError: Buffer dtype mismatch, expected 'const intp_t' but got 'long long'
2.INSTALLED VERSIONS python : 3.7.2 OS : Ubuntu 16.04.4 LTS machine : armv7l pandas : 1.3.3 numpy : 1.21.6
3.Cross compiler = arm-linux-gnueabihf-gcc
4.Example of target library files for partial cross compilation,name as : algos.cpython-37m-arm-linux-gnueabihf.so lib.cpython-37m-arm-linux-gnueabihf.so ......... etc.
Hi Bogdan, just did the verification on Raspberry 4 (32bit) official Debian Bullseye OS (32bit):
pi@sheep:~$ getconf LONG_BIT
32
pi@sheep:~$ file /lib/systemd/systemd
/lib/systemd/systemd: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), \
dynamically linked, interpreter /lib/ld-linux-armhf.so.3, \
BuildID[sha1]=c8e472c9a12568fbde5035980497ffc8e4a857cd, for GNU/Linux 3.2.0, \
stripped
The test was successful:
>>> d = pd.DataFrame([1.0, 2.0])
>>> d.corr(method='spearman')
0
0 1.0
Installed Versions:
(venv) pi@sheep:~ $ python
Python 3.10.13 (main, Oct 9 2023, 15:41:20) [GCC 10.2.1 20210110] on linux
>>> import pytz as pt
>>> pt.__version__
'2023.3.post1'
>>> import dateutil as dt
>>> dt.__version__
'2.8.2'
>>> import six as x
>>> x.__version__
'1.16.0'
>>> import numpy as np
>>> np.__version__
'1.22.3'
>>> import pandas as pd
>>> pd.__version__
'1.4.2'
So it seems Yocto kirkstone might compile numpy, pandas, ... not properly for 32bit... By the way I have backported down to gatesgarth where I did not fiind working setup either numpy and pandas are not compatible or 32bit error occurs.
That means we might need to find out the magic setting (options) for cross compiling numpy, pands...
@avan051 it seems your cross compiler or options on Ubuntu (32bit) are also missing some magic settings
@mweitner ok,thanks for your reply! BTW, the same configuration is okay for me to run on an ubuntu for x86 platform。 but ,cross compiler to arm,is not okay.
Hello Michael,
There are two more tests that we can do to narrow down further the issue. I am planning to run them tomorrow or a day after tomorrow after I'll build a qemu image. If you have time the tests are quite simple:
We might see that only module compiled by Yocto is causing the problem so we'll focus on that one.
Hi Bogdan, it seems Yocto build of pandas is the bad one as tested Test 1. successfully:
pandas==1.4.2 installed by pip3 on target
root@raspberrypi4:~# pip3 install pandas==1.4.2 --no-use-pep517 -vvv
...
Running setup.py install for pandas ... done
Successfully installed pandas-1.4.2 python-dateutil-2.8.2 pytz-2023.3.post1 six-1.16.0
Verification:
>>> import numpy as np
>>> np.__version__
'1.22.3'
>>> import pandas as pd
>>> pd.__version__
'1.4.2'
>>> d = pd.DataFrame([1.0, 2.0])
>>> d.corr(method='spearman')
0
0 1.0
I will dive further into how Yocto build compiles python3-pandas package and what is the difference between on-target pip build and Yocto cross compile build?
So far I identified Options missing on Yocto build:
-pipe -feliminate-unused-debug-types \
-DHAVE_BROKEN_POSIX_SEMAPHORES \
-feliminate-unused-debug-types \
-fPIC -DNPY_NO_DEPRECATED_API=0
However adding those to TOOLCHAIN_OPTIONS in order to have them at cross compiler CC did not help.
Is there anyone who knows pip build process of pandas to identify the missing/wrong option, parameter, libary link, etc?
While I am building the qemu image, I kept looking at the source code. I believe that the the issue is coming from the mismatch definition of the intp_t type in pandas, numpy AND cython, definitely caused by some compiling parameter in Yocto. I guess that in one (or more) recipe the architecture is not properly detected using 64-bit platform instead of 32-bit. Also, checking out the WHEEL in the dist-info folder for the packages (i.e./usr/lib/python3.10/site-packages/pandas-1.4.2.dist-info/WHEEL) , I found that numpy/pandas/cython are having the tag: Tag: cp310-cp310-linux_x86_64 which means that the compatibility platform is x86_64. For the rest of the packages the Tag is py2-none-any or py3-none-any. Can you please check on your unit after pandas was installed with pip command, what that WHEEL file contain?
Hi Bogdan, just stumbled over issue labels ... it could make sense we tag this issue with following labels:
This might help to get more visible with our issue Anyway I come back with pending verification of WHEEL details sorry takes some more time as I forgot to tell pip not to cleanup build ;-)
by the way I am a bit worried for the future supporting arm 32bit target using pandas or even Data science python3 eco system as there seems to be a lot going on decreasing project effort by removing 32bit support in future see open issue:
Back to technical issue.. I might have found a track while forced rebuilding of pandas by pip I see there might be no wheel support for arm 32 bit "anymore" ... as it downloads tar with egg-info. If you follow Issue #44453 discussion it mentions that numpy project has dropped 32bit wheel support...
Yocto build of our numpy version has wheel support but 64 bit tag:
root@raspberrypi4:~# cat /usr/lib/python3.10/site-packages/numpy-1.22.3.dist-info/WHEEL
Wheel-Version: 1.0
Generator: bdist_wheel (0.37.1)
Root-Is-Purelib: false
Tag: cp310-cp310-linux_x86_64
It looks like when pip install pandas on target it compiles it dependency numpy again with 32bit arm support...
pandas installed on-target by pip:
root@raspberrypi4:~# find / -iname "*pandas*"
/usr/lib/python3.10/site-packages/pandas
/usr/lib/python3.10/site-packages/pandas/tests/io/json/__pycache__/test_pandas.cpython-310.pyc
/usr/lib/python3.10/site-packages/pandas/tests/io/json/test_pandas.py
/usr/lib/python3.10/site-packages/pandas-1.4.2-py3.10.egg-info
Forced pip reinstall pending:
/tmp/pip-pip-egg-info-dhmxqt13/pandas.egg-info
/tmp/pip-unpack-a4zhymls/pandas-1.4.2.tar.gz
/tmp/pip-install-n6aoumqy/pandas_89a387b3cfdf47d993fe5881fcff294b
/tmp/pip-install-n6aoumqy/pandas_89a387b3cfdf47d993fe5881fcff294b/pandas.egg-info
/tmp/pip-install-n6aoumqy/pandas_89a387b3cfdf47d993fe5881fcff294b/pandas
/tmp/pip-install-n6aoumqy/pandas_89a387b3cfdf47d993fe5881fcff294b/pandas/tests/io/json/test_pandas.py
/tmp/pip-install-n6aoumqy/pandas_89a387b3cfdf47d993fe5881fcff294b/doc/source/_static/css/pandas.css
Interesting is on a forced-reinstall I get build error on target telling me can not build with --no-use-pep517 option as using meson build backend... When building without --no-use-pep517 option it build successfully. In addition, it creates a numpy wheel file for numpy==1.22.6 for arm 32 bit.
That proofs the guess above is right, the pip install pandas on target builds wheel file for latest supporting numpy==1.21.6 compiled for arm 32bit:
root@raspberrypi4:~# cat \
/usr/lib/python3.10/site-packages/numpy-1.22.3.dist-info/WHEEL
Wheel-Version: 1.0
Generator: bdist_wheel (0.37.1)
Root-Is-Purelib: false
Tag: cp310-cp310-linux_x86_64
root@raspberrypi4:~# cat \
/tmp/pip-build-env-h3yyacej/overlay/lib/python3.10/site-packages/numpy-1.21.6.dist-info/WHEEL
Wheel-Version: 1.0
Generator: bdist_wheel (0.37.0)
Root-Is-Purelib: false
Tag: cp310-cp310-linux_armv7l
Solution of our problem should be to adapt all relevant python3 packages recipes of Yocto to build arm 32 bit wheel files.
Another example when Yocto builds and installs Cython (python3-cython) compared to on target pip install:
root@raspberrypi4:~# cat /usr/lib/python3.10/site-packages/Cython-0.29.28.dist-info/WHEEL
Wheel-Version: 1.0
Generator: bdist_wheel (0.37.1)
Root-Is-Purelib: false
Tag: cp310-cp310-linux_x86_64
root@raspberrypi4:~# cat /tmp/pip-build-env-h3yyacej/overlay/lib/python3.10/site-packages/Cython-0.29.36.dist-info/WHEEL
Wheel-Version: 1.0
Generator: bdist_wheel (0.40.0)
Root-Is-Purelib: true
Tag: py2-none-any
Tag: py3-none-any
If you agree with my analysis, I am going to dive into fixing yocto recipes and hopefully coming up with patches and/or upstream pull request... Can you support here as well?
Anyway I keep you informed...
Yes, definitely the solution is to adapt the Yocto recipes to compile the right version of these packages. And yes, I am willing to support the effort.
When cross-compiled for arm 32-bit under Yocto, the WHEELs that are claimed to be compiled for X86_64 contain in fact the libraries with the correct signature "ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked". So there are two issues to be addressed:
The item #1 can easily be addressed if we create a bbappend file with the content: SETUPTOOLS_BUILD_ARGS += "--plat-name ${MACHINE}" This will create the wheel using the machine name instead of x86_64 (i.e. Cython-0.29.28-cp310-cp310-qemuarm.whl). Also the WHEEL tag is properly set. Sure that the ${MACHINE} parameter can be replaced with TUNE_ARCH if we want to have the CPU type in there. I think that this should be part of setuptools3.bbclass as it will be a good practice to have the target machine in the component of the wheel.
yes also just did same with SETUPTOOLS_BUILD_ARGS set to
SETUPTOOLS_BUILD_ARGS:append = " --plat-name linux-armv7l"
which works nicely .. So think we got the solution for #1 great thx for your help
So you think patching setuptool3.bbclass could be right fix for it ...
Next is #2 The pandas problem, the mismatch of intp_t type between modules which I will check next week on raspberrypi
From my perspective, having the setuptool3.bbclass patched is the right approach. However, I cannot say that I can see all use cases / implications of this so it could be that the maintainer will disagree. I'll try to get in contact with them. On #2, I still don't know if the issue is coming from numpy/pandas or from cython.
Hi Bogdan, yes agree our problem is not completely solved with setuptools3..bbclass and fix on wheel tag...
I think the content of wheel file is not right meaning compiled numpy and pandas especially its C extensions...
Unfortunately I do not have a lot of time rest of the week... what I started with is building numpy and pandas in context of Yocto extSDK trying to find the right build options for numpy and pandas ... and verify on my raspi 32bit ...
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
The simplest test on a dataframe object is failing on an embedded 32bit device with ARMv7 CPU. No matter how the dataframe is created, from an array or read from a csv file using read_csv, access on the object fails the same:
File "pandas/_libs/internals.pyx", line 711, in pandas._libs.internals.BlockManager._rebuild_blknos_and_blklocs ValueError: Buffer dtype mismatch, expected 'intp_t' but got 'long long'
Unfortunately, on that platform I cannot compile a newer version of pandas, so I cannot verify if the issue is present in the latest version or main branch.
Expected Behavior
The expected output is: Name Age 0 Alice 25 1 Bob 30 2 Carol 35
but running the script is giving: File "/home/root/test.py", line 10, in
print(df)
File "/usr/lib/python3.10/site-packages/pandas/core/frame.py", line 1011, in repr
return self.to_string(**repr_params)
File "/usr/lib/python3.10/site-packages/pandas/core/frame.py", line 1192, in to_string
return fmt.DataFrameRenderer(formatter).to_string(
File "/usr/lib/python3.10/site-packages/pandas/io/formats/format.py", line 1128, in to_string
string = string_formatter.to_string()
File "/usr/lib/python3.10/site-packages/pandas/io/formats/string.py", line 25, in to_string
text = self._get_string_representation()
File "/usr/lib/python3.10/site-packages/pandas/io/formats/string.py", line 40, in _get_string_representation
strcols = self._get_strcols()
File "/usr/lib/python3.10/site-packages/pandas/io/formats/string.py", line 31, in _get_strcols
strcols = self.fmt.get_strcols()
File "/usr/lib/python3.10/site-packages/pandas/io/formats/format.py", line 611, in get_strcols
strcols = self._get_strcols_without_index()
File "/usr/lib/python3.10/site-packages/pandas/io/formats/format.py", line 864, in _get_strcols_without_index
str_columns = self._get_formatted_column_labels(self.tr_frame)
File "/usr/lib/python3.10/site-packages/pandas/io/formats/format.py", line 943, in _get_formatted_column_labels
dtypes = self.frame.dtypes
File "/usr/lib/python3.10/site-packages/pandas/core/generic.py", line 5746, in dtypes
data = self._mgr.get_dtypes()
File "/usr/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 228, in get_dtypes
return dtypes.take(self.blknos)
File "/usr/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 168, in blknos
self._rebuild_blknos_and_blklocs()
File "pandas/_libs/internals.pyx", line 711, in pandas._libs.internals.BlockManager._rebuild_blknos_and_blklocs
ValueError: Buffer dtype mismatch, expected 'intp_t' but got 'long long'
Installed Versions