vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second πŸš€
https://vaex.io
MIT License
8.29k stars 589 forks source link

[BUG-REPORT] Incompatibility of PyPI prebuilt wheel with arm mac #1903

Closed yuhsak closed 2 years ago

yuhsak commented 2 years ago

First of all, I'd like to say a big thank you to the vaex team. I'm using this library almost every day and it really is a life saver for me.

There seems to be a lot of options for both case processing huge datasets on distributed clusters through a sufficient economic budget, and processing small datasets on single machine. But the situation would suddenly be becoming harder to find an efficient solution when one needs to deal with large datasets that are too big to process on a single machine, but too small for a real distributed cluster.

I used to use Dask and the standalone Spark to deal with a such situation, but unfortunately both has each difficulties and inconveniences which are difficult to ignore for me, so I started using Vaex and am being amazed at its stability and efficiency.

By the way, I've got some question about installation. Let me describe it.

Description It downloads prebuilt wheel and successfully finishes when I try to install vaex from PyPI on m1 mac simply through pip. But after that, it raises an ImportError related to superstrings.cpython-39-darwin.so when I import vaex from python script. This also happens even when I try to install from conda-forge through miniconda.

It seems to be somehow caused by incompatible prebuilt superstrings shared object. Is that a known issue already? I just wonder if there were any checks/tests that I could help for this so let me know if there were, and also in case if it's a specific issue happens only in my environment, please just ignore this question.

I still can build vaex from source code or just through pip command using --no-binary option with PCRE from Homebrew, then that error goes away.

Software information

Additional information

Installation log when the error happens (using prebuilt wheel )

$ pip install vaex
Collecting vaex==4.8.0
  Downloading vaex-4.8.0-py3-none-any.whl (4.7 kB)
Collecting vaex-ml<0.18,>=0.17.0
  Downloading vaex_ml-0.17.0-py3-none-any.whl (56 kB)
Collecting vaex-astro<0.10,>=0.9.0
  Downloading vaex_astro-0.9.0-py3-none-any.whl (20 kB)
Collecting vaex-viz<0.6,>=0.5.1
  Downloading vaex_viz-0.5.1-py3-none-any.whl (19 kB)
Collecting vaex-hdf5<0.13,>=0.12.0
  Downloading vaex_hdf5-0.12.0-py3-none-any.whl (16 kB)
Collecting vaex-core<4.9,>=4.8.0
  Downloading vaex_core-4.8.0-cp39-cp39-macosx_11_0_arm64.whl (4.1 MB)
Collecting vaex-server<0.9,>=0.8.1
  Downloading vaex_server-0.8.1-py3-none-any.whl (23 kB)
Collecting vaex-jupyter<0.8,>=0.7.0
  Downloading vaex_jupyter-0.7.0-py3-none-any.whl (43 kB)
~(logs for other requirements)~
Installing collected packages: vaex-core, vaex-viz, vaex-server, vaex-ml, vaex-hdf5, vaex-astro, vaex-jupyter, vaex
Successfully installed vaex-4.8.0 vaex-astro-0.9.0 vaex-core-4.8.0 vaex-hdf5-0.12.0 vaex-jupyter-0.7.0 vaex-ml-0.17.0 vaex-server-0.8.1 vaex-viz-0.5.1
$ python
Python 3.9.9 (main, Jan  6 2022, 16:12:50)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import vaex
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/yuhsak/.pyenv/versions/3.9.9/lib/python3.9/site-packages/vaex/__init__.py", line 46, in <module>
    import vaex.dataframe
  File "/Users/yuhsak/.pyenv/versions/3.9.9/lib/python3.9/site-packages/vaex/dataframe.py", line 31, in <module>
    import vaex.hash
  File "/Users/yuhsak/.pyenv/versions/3.9.9/lib/python3.9/site-packages/vaex/hash.py", line 11, in <module>
    from vaex.column import _to_string_sequence
  File "/Users/yuhsak/.pyenv/versions/3.9.9/lib/python3.9/site-packages/vaex/column.py", line 17, in <module>
    import vaex.strings
  File "/Users/yuhsak/.pyenv/versions/3.9.9/lib/python3.9/site-packages/vaex/strings.py", line 10, in <module>
    from .superstrings import *
ImportError: dlopen(/Users/yuhsak/.pyenv/versions/3.9.9/lib/python3.9/site-packages/vaex/superstrings.cpython-39-darwin.so, 0x0002): symbol not found in flat namespace '__ZN7pcrecpp2RE6no_argE'
$ otool -L /Users/yuhsak/.pyenv/versions/3.9.9/lib/python3.9/site-packages/vaex/superstrings.cpython-39-darwin.so
/Users/yuhsak/.pyenv/versions/3.9.9/lib/python3.9/site-packages/vaex/superstrings.cpython-39-darwin.so:
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1200.3.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)

Installation log when the error doesn't happen (build locally)

$ brew install pcre
$ pip install vaex --no-binary :all:
Collecting vaex==4.8.0
  Downloading vaex-4.8.0.tar.gz (4.8 kB)
  Preparing metadata (setup.py) ... done
Collecting vaex-core<4.9,>=4.8.0
  Downloading vaex-core-4.8.0.tar.gz (2.2 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting vaex-astro<0.10,>=0.9.0
  Downloading vaex-astro-0.9.0.tar.gz (16 kB)
  Preparing metadata (setup.py) ... done
Collecting vaex-hdf5<0.13,>=0.12.0
  Downloading vaex-hdf5-0.12.0.tar.gz (13 kB)
  Preparing metadata (setup.py) ... done
Collecting vaex-viz<0.6,>=0.5.1
  Downloading vaex-viz-0.5.1.tar.gz (16 kB)
  Preparing metadata (setup.py) ... done
Collecting vaex-server<0.9,>=0.8.1
  Downloading vaex-server-0.8.1.tar.gz (18 kB)
  Preparing metadata (setup.py) ... done
Collecting vaex-jupyter<0.8,>=0.7.0
  Downloading vaex-jupyter-0.7.0.tar.gz (35 kB)
  Preparing metadata (setup.py) ... done
Collecting vaex-ml<0.18,>=0.17.0
  Downloading vaex-ml-0.17.0.tar.gz (47 kB)
  Preparing metadata (setup.py) ... done
~(logs for other requirements)~
Skipping wheel build for vaex, due to binaries being disabled for it.
Skipping wheel build for vaex-astro, due to binaries being disabled for it.
Skipping wheel build for vaex-hdf5, due to binaries being disabled for it.
Skipping wheel build for vaex-jupyter, due to binaries being disabled for it.
Skipping wheel build for vaex-ml, due to binaries being disabled for it.
Skipping wheel build for vaex-server, due to binaries being disabled for it.
Skipping wheel build for vaex-viz, due to binaries being disabled for it.
Building wheels for collected packages: vaex-core
  Building wheel for vaex-core (pyproject.toml) ... done
  Created wheel for vaex-core: filename=vaex_core-4.8.0-cp39-cp39-macosx_12_0_arm64.whl size=3986856 sha256=df74bc0cb2f94debf6baf36c6d3ed3b9393f65a2550a34af5d3f01e83c53ee85
  Stored in directory: /private/var/folders/_b/s8btn93d5_vfj9_2hq6ys89h0000gn/T/pip-ephem-wheel-cache-1wtf1pi3/wheels/11/6b/f9/ede4f70f9662dd51203f823681bdf60c34f016f340da593cb6
Successfully built vaex-core
Installing collected packages: vaex-core, vaex-viz, vaex-server, vaex-ml, vaex-hdf5, vaex-astro, vaex-jupyter, vaex
  Running setup.py install for vaex-viz ... done
  Running setup.py install for vaex-server ... done
  Running setup.py install for vaex-ml ... done
  Running setup.py install for vaex-hdf5 ... done
  Running setup.py install for vaex-astro ... done
  Running setup.py install for vaex-jupyter ... done
  Running setup.py install for vaex ... done
Successfully installed vaex-4.8.0 vaex-astro-0.9.0 vaex-core-4.8.0 vaex-hdf5-0.12.0 vaex-jupyter-0.7.0 vaex-ml-0.17.0 vaex-server-0.8.1 vaex-viz-0.5.1
$ python
Python 3.9.9 (main, Jan  6 2022, 16:12:50)
[Clang 13.0.0 (clang-1300.0.29.30)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import vaex
>>> vaex.__version__
{'vaex': '4.8.0', 'vaex-core': '4.8.0', 'vaex-viz': '0.5.1', 'vaex-hdf5': '0.12.0', 'vaex-server': '0.8.1', 'vaex-astro': '0.9.0', 'vaex-jupyter': '0.7.0', 'vaex-ml': '0.17.0'}
$ otool -L /Users/yuhsak/.pyenv/versions/3.9.9/lib/python3.9/site-packages/vaex/superstrings.cpython-39-darwin.so
/Users/yuhsak/.pyenv/versions/3.9.9/lib/python3.9/site-packages/vaex/superstrings.cpython-39-darwin.so:
    /opt/homebrew/opt/pcre/lib/libpcre.1.dylib (compatibility version 4.0.0, current version 4.13.0)
    /opt/homebrew/opt/pcre/lib/libpcrecpp.0.dylib (compatibility version 1.0.0, current version 1.2.0)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1200.3.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)
maartenbreddels commented 2 years ago

First of all, I'd like to say a big thank you to the vaex team. I'm using this library almost every day and it really is a life saver for me.

Thanks for lettings us know!

I used to use Dask and the standalone Spark to deal with a such situation, but unfortunately both has each difficulties and inconveniences which are difficult to ignore for me, so I started using Vaex and am being amazed at its stability and efficiency.

that is exactly where vaex shines, postponing having to work with more complex libraries.

Thank you for the report, that's really annoying. I'd like to hear from other people if they have the same issue. @JovanVeljanoski could you try to reproduce this on your M1?

JovanVeljanoski commented 2 years ago

Hi,

Installing from conda/mamba works just fine both python 3.8 and 3.9. Been using it almost daily for a couple of months.

Installing from pip indeed fails for me, for both py 3.8 and 3.9. The main problems seem to be

pikeas commented 2 years ago

Having trouble with this as well!

MacOS: 12.2 Python: 3.9.10 and 3.10.2 virtualenvs Install method: poetry add vaex, which uses pip 22.0.4 under the hood.

This fails with RuntimeError: Could not find a llvm-config binary while installing llvmlite. It looks like they're working on it over at https://github.com/numba/llvmlite/issues/693, is there a good way to work around this for now?

Raidus commented 2 years ago

As @JovanVeljanoski suggest using conda/mambda works for me too using python3.9 πŸ‘

Steps

conda install mamba -n base -c conda-forge #install mamba into base environment
conda create --name myenv python=3.9 #create environment
conda activate myenv #create environment
mamba install vaex -c conda-forge #install vaex
MaiHoangViet1809 commented 2 years ago

got this bug too on Mac M1 too, I have no idea how to fix it:

    from .superstrings import *
ImportError: dlopen(/Users/****/Projects/****/.venv/lib/python3.9/site-packages/vaex/superstrings.cpython-39-darwin.so, 0x0002): symbol not found in flat namespace '__ZN7pcrecpp2RE6no_argE'
alexander-beedie commented 2 years ago

Ditto; first attempt to experiment with vaex, but can't import after pip install. Also an Apple Silicon Mac (macOS 12.4, M1 Pro):

ImportError: dlopen(/.../python3.9/site-packages/vaex/superstrings.cpython-39-darwin.so, 0x0002):
  symbol not found in flat namespace '__ZN7pcrecpp2RE6no_argE'

Also: not in a position to use conda - environment is constrained. Update: only compiling locally worked - pip install vaex --no-binary :all: (version: vaex v4.9.2)

bigmike36c commented 2 years ago

Receiving the exact same error as @alexander-beedie on an Apple Silicon Mac (macOS 12.2.1, M1 Pro). However, strangely enough, this issue does not occur on an identical machine after installing vaex in an identical virtual environment.

chriddyp commented 2 years ago

FYI, I've been "solving" this on my end by running a second terminal in Rosetta and pip install'ing it with that Rosetta terminal. Here are the instructions I used for setting up a Rosetta terminal:

https://www.byran.tech/html/how-to-make-a-rosetta-2-emulated-x86-terminal-on-arm-apple-silicon-chips.html

arunpersaud commented 2 years ago

I'm getting the same error using python 3.9 on a mac M1 using homebrew after doing a pip install.

It seems that /opt/homebrew/lib/python3.9/site-packages/vaex/superstrings.cpython-39-darwin.so didn't get linked to the pcre library.

I have pcre installed via brew (brew install pcre) and can pre-load the library using export DYLD_INSERT_LIBRARIES=/opt/homebrew/Cellar/pcre/8.45/lib/libpcrecpp.dylib (assuming bash).

After this importing vaex from python3.9 works fro me. So this might be a possible workaround until it got fixed.

The homebrew path might vary for people, but you can check with brew ls pcre what your path should be.

Checking which libraries are linked in superstrings doesn't link pcre, which I think it should:

otool -L /opt/homebrew/lib/python3.9/site-packages/vaex/superstrings.cpython-39-darwin.so
/opt/homebrew/lib/python3.9/site-packages/vaex/superstrings.cpython-39-darwin.so:
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1200.3.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)
maartenbreddels commented 2 years ago

It seems that /opt/homebrew/lib/python3.9/site-packages/vaex/superstrings.cpython-39-darwin.so didn't get linked to the pcre library.

This triggered the right thoughts for me! Thanks!

I now see this in the logs:

2022-07-25T10:56:21.4013800Z   [7/25] Linking CXX shared module superstrings.cpython-39-darwin.so
2022-07-25T10:56:21.4115880Z   ld: warning: ignoring file /usr/local/lib/libpcre.dylib, building for macOS-arm64 but attempting to link with file built for macOS-x86_64
2022-07-25T10:56:21.4217910Z   ld: warning: ignoring file /usr/local/lib/libpcrecpp.dylib, building for macOS-arm64 but attempting to link with file built for macOS-x86_64

For #2124

Really hope we can fix this soon!

maartenbreddels commented 2 years ago

Costed me some blood, sweat, and tears, but it's working now on mac+arm πŸŽ‰ :

$ pip install vaex-core==4.11.0
alexander-beedie commented 2 years ago

Costed me some blood, sweat, and tears, but it's working now on mac+arm πŸŽ‰ :

Fantastic; thanks very much for this - can confirm it's working! (And thanks to @arunpersaud for narrowing it down :)

yuhsak commented 2 years ago

@maartenbreddels Wow I see now it works only with pip install vaex-core on my M1 MBP. Thank you very much for the contribution! That's amazing.