piwheels / packages

Issue tracker for piwheels package issues
https://github.com/piwheels/packages/issues
20 stars 5 forks source link

Missing package: pyarrow #195

Open josuuribe opened 3 years ago

josuuribe commented 3 years ago

Package name: pyarrow Issue type: Build failed Link to PyPI page: https://pypi.org/project/pyarrow Link to piwheels page: https://www.piwheels.org/project/pyarrow/ Version: All Python version: 3.5+ I am the maintainer: No More information:

Apache Arrow defines a language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. This library is used by vaex-core that also fails

Detailed instructions about the installation can be found here: https://arrow.apache.org/install/

Additional help https://gist.github.com/heavyinfo/04e1326bb9bed9cecb19c2d603c8d521

I suppose the main reason is the need for Apache arrow libraries

bennuttall commented 3 years ago

This has been raised before. We closed it as it didn't seem feasible to add to our automated build.

Can you follow the instructions and build it successfully on a Pi?

josuuribe commented 3 years ago

Not yet, I expected it would be more easy in a specialized builder machine like yours, but I have read several people has got it. The problem is this library is used by several other ones, especially those related to deal with big data. I have the idea to create a specialized Docker container if i get how to build it, as other open source projects like PyTorch o Tensorflow does.

josuuribe commented 3 years ago

FROM debian:latest

ARG DEBIAN_FRONTEND=noninteractive ARG REPO_HOME=/repos ARG ARROW_HOME=$REPO_HOME/dist ARG LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH ARG PYARROW_WITH_PARQUET=1 ARG PARQUET_TEST_DATA=$REPO_HOME/arrow/cpp/submodules/parquet-testing/data ARG ARROW_TEST_DATA=$REPO_HOME/arrow/testing/data ARG ARROW_BUILD_TYPE=release ARG ARROW_TAG=apache-arrow-3.0.0

RUN apt-get update -y && apt-get install -y libjemalloc-dev libboost-dev \ libboost-filesystem-dev \ libboost-system-dev \ libboost-regex-dev \ make \ build-essential \ g++ \ libgflags-dev \ rapidjson-dev \ libre2-dev \ python3-dev \ libatlas-base-dev \ python3-dev \ autoconf \ flex \ bison \ libgrpc-dev \ git && \ rm -rf /var/lib/apt/lists/ && \ rm -rf /tmp/

ADD https://bootstrap.pypa.io/get-pip.py get-pip.py RUN python3 get-pip.py RUN python3 -m pip config --global set global.extra-index-url https://www.piwheels.org/simple RUN python3 -m pip install --upgrade \ cmake \ wheel \ numpy

WORKDIR $REPO_HOME RUN git clone https://github.com/apache/arrow.git WORKDIR $REPO_HOME/arrow RUN git checkout tags/$ARROW_TAG -b build RUN git submodule init RUN git submodule update

WORKDIR $REPO_HOME RUN python3 -m pip install -r arrow/python/requirements-build.txt -r arrow/python/requirements-test.txt

WORKDIR $REPO_HOME/arrow/cpp/build RUN cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DPYTHON3_EXECUTABLE=$(which python3) \ -DPYTHON_INCLUDE_DIR=$(python3 -c "from distutils.sysconfig import get_python_inc;print(get_python_inc())") \ -DCMAKE_INSTALL_LIBDIR=lib \ -DPYTHON_INCLUDE_DIR2=$(python3 -c "from os.path import dirname; from distutils.sysconfig import get_config_h_filename; print(dirname(get_config_h_filename()))") \ -DARROW_WITH_BZ2=ON \ -DPYTHON_LIBRARY=$(python3 -c "from distutils.sysconfig import get_config_var;from os.path import dirname,join ; print(join(dirname(get_config_var('LIBPC')),get_config_var('LDLIBRARY')))") \ -DARROW_WITH_ZLIB=ON \ -DPYTHON3_NUMPY_INCLUDE_DIRS=$(python3 -c "import numpy; print(numpy.get_include())") \ -DARROW_WITH_ZSTD=ON \ -DPYTHON3_PACKAGES_PATH=$(python3 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())") \
-DARROW_WITH_LZ4=ON \ -DARROW_WITH_SNAPPY=ON \ -DARROW_WITH_BROTLI=ON \ -DARROW_PARQUET=ON \ -DARROW_PYTHON=ON \ -DARROW_BUILD_TESTS=ON \ .. RUN make -j$(nproc) RUN make install

WORKDIR $REPO_HOME/arrow/python RUN python3 setup.py build_ext --inplace RUN python3 -m pytest pyarrow 2>&1 || echo "Some unit tests have failed" RUN python3 setup.py build_ext --build-type=$ARROW_BUILD_TYPE --bundle-arrow-cpp bdist_wheel

WORKDIR /drop RUN cp $REPO_HOME/arrow/python/dist/*.whl .

CMD ["/bin/bash"]

josuuribe commented 3 years ago

Execute with: docker run -dit _imageid

Copy wheel from docker image docker cp _containerid:/drop .

Now, you can stop container docker container stop _containerid

It works for Apache 4.0.0 (master) and also for latest stable version (3.0.0) anyway you can switch versions using ARROW_TAG while build (set as value the same label as exists in Arrow GitHub repository)

Original here: https://github.com/josuuribe/RaraAvis/blob/blog/Docker/build/Dockerfile.arrow

I hope this helps!!

Thanks for your effort with pywheels!

MarcelBeining commented 4 days ago

auto build still not feasable in 2024?