holoviz / spatialpandas

Pandas extension arrays for spatial/geometric operations
BSD 2-Clause "Simplified" License
308 stars 25 forks source link

Set python-snappy as optional dependency to work with Python 3.11 on pip install #116

Closed weiji14 closed 1 year ago

weiji14 commented 1 year ago

Is your feature request related to a problem? Please describe.

Trying to install spatialpandas in a Python 3.11 environment currently fails due to a hard dependency on python-snappy which doesn't have wheels for Python 3.11 (see https://github.com/andrix/python-snappy/issues/124).

mamba create --name temp python=3.11
mamba activate temp
python -m pip install spatialpandas==0.4.7

produces this traceback

Collecting spatialpandas==0.4.7
  Using cached spatialpandas-0.4.7-py2.py3-none-any.whl (120 kB)
Collecting dask (from spatialpandas==0.4.7)
  Using cached dask-2023.5.0-py3-none-any.whl (1.2 MB)
Collecting fsspec (from spatialpandas==0.4.7)
  Using cached fsspec-2023.5.0-py3-none-any.whl (160 kB)
Collecting numba (from spatialpandas==0.4.7)
  Downloading numba-0.57.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/3.6 MB 22.8 MB/s eta 0:00:00
Collecting pandas (from spatialpandas==0.4.7)
  Downloading pandas-2.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.2/12.2 MB 19.9 MB/s eta 0:00:00
Collecting param (from spatialpandas==0.4.7)
  Using cached param-1.13.0-py2.py3-none-any.whl (87 kB)
Collecting pyarrow>=1.0 (from spatialpandas==0.4.7)
  Downloading pyarrow-12.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.9/38.9 MB 3.2 MB/s eta 0:00:00
Collecting python-snappy (from spatialpandas==0.4.7)
  Downloading python-snappy-0.6.1.tar.gz (24 kB)
  Preparing metadata (setup.py) ... done
Collecting retrying (from spatialpandas==0.4.7)
  Using cached retrying-1.3.4-py3-none-any.whl (11 kB)
Collecting numpy>=1.16.6 (from pyarrow>=1.0->spatialpandas==0.4.7)
  Downloading numpy-1.24.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 7.6 MB/s eta 0:00:00
Collecting click>=8.0 (from dask->spatialpandas==0.4.7)
  Using cached click-8.1.3-py3-none-any.whl (96 kB)
Collecting cloudpickle>=1.5.0 (from dask->spatialpandas==0.4.7)
  Using cached cloudpickle-2.2.1-py3-none-any.whl (25 kB)
Collecting packaging>=20.0 (from dask->spatialpandas==0.4.7)
  Using cached packaging-23.1-py3-none-any.whl (48 kB)
Collecting partd>=1.2.0 (from dask->spatialpandas==0.4.7)
  Using cached partd-1.4.0-py3-none-any.whl (18 kB)
Collecting pyyaml>=5.3.1 (from dask->spatialpandas==0.4.7)
  Downloading PyYAML-6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (757 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 757.9/757.9 kB 3.3 MB/s eta 0:00:00
Collecting toolz>=0.10.0 (from dask->spatialpandas==0.4.7)
  Using cached toolz-0.12.0-py3-none-any.whl (55 kB)
Collecting importlib-metadata>=4.13.0 (from dask->spatialpandas==0.4.7)
  Using cached importlib_metadata-6.6.0-py3-none-any.whl (22 kB)
Collecting llvmlite<0.41,>=0.40.0dev0 (from numba->spatialpandas==0.4.7)
  Downloading llvmlite-0.40.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.1/42.1 MB 9.5 MB/s eta 0:00:00
Collecting python-dateutil>=2.8.2 (from pandas->spatialpandas==0.4.7)
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2020.1 (from pandas->spatialpandas==0.4.7)
  Using cached pytz-2023.3-py2.py3-none-any.whl (502 kB)
Collecting tzdata>=2022.1 (from pandas->spatialpandas==0.4.7)
  Using cached tzdata-2023.3-py2.py3-none-any.whl (341 kB)
Collecting six>=1.7.0 (from retrying->spatialpandas==0.4.7)
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting zipp>=0.5 (from importlib-metadata>=4.13.0->dask->spatialpandas==0.4.7)
  Using cached zipp-3.15.0-py3-none-any.whl (6.8 kB)
Collecting locket (from partd>=1.2.0->dask->spatialpandas==0.4.7)
  Using cached locket-1.0.0-py2.py3-none-any.whl (4.4 kB)
Building wheels for collected packages: python-snappy
  Building wheel for python-snappy (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [27 lines of output]
      /home/user/mambaforge/envs/temp/lib/python3.11/site-packages/setuptools/_distutils/dist.py:265: UserWarning: Unknown distribution option: 'cffi_modules'
        warnings.warn(msg)
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-311
      creating build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/__main__.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/snappy.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/hadoop_snappy.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/snappy_cffi_builder.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/snappy_cffi.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/snappy_formats.py -> build/lib.linux-x86_64-cpython-311/snappy
      copying src/snappy/__init__.py -> build/lib.linux-x86_64-cpython-311/snappy
      running build_ext
      building 'snappy._snappy' extension
      creating build/temp.linux-x86_64-cpython-311
      creating build/temp.linux-x86_64-cpython-311/src
      creating build/temp.linux-x86_64-cpython-311/src/snappy
      gcc -pthread -B /home/user/mambaforge/envs/temp/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/user/mambaforge/envs/temp/include -fPIC -O2 -isystem /home/user/mambaforge/envs/temp/include -fPIC -I/home/user/mambaforge/envs/temp/include/python3.11 -c src/snappy/crc32c.c -o build/temp.linux-x86_64-cpython-311/src/snappy/crc32c.o
      gcc -pthread -B /home/user/mambaforge/envs/temp/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/user/mambaforge/envs/temp/include -fPIC -O2 -isystem /home/user/mambaforge/envs/temp/include -fPIC -I/home/user/mambaforge/envs/temp/include/python3.11 -c src/snappy/snappymodule.cc -o build/temp.linux-x86_64-cpython-311/src/snappy/snappymodule.o
      src/snappy/snappymodule.cc:33:10: fatal error: snappy-c.h: No such file or directory
         33 | #include <snappy-c.h>
            |          ^~~~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for python-snappy
  Running setup.py clean for python-snappy
Failed to build python-snappy
ERROR: Could not build wheels for python-snappy, which is required to install pyproject.toml-based projects

Describe the solution you'd like

A clear and concise description of what you want to happen.

Convert python-snappy from a required to an optional dependency in the setup.py file:

https://github.com/holoviz/spatialpandas/blob/bc3e52cfb2ef84c411ad0aff34a683ee0955ce66/setup.py#L31-L40

Looking at the codebase, I only see snappy mentioned for the parquet I/O in two places:

https://github.com/holoviz/spatialpandas/blob/bc3e52cfb2ef84c411ad0aff34a683ee0955ce66/spatialpandas/dask.py#L208 https://github.com/holoviz/spatialpandas/blob/bc3e52cfb2ef84c411ad0aff34a683ee0955ce66/spatialpandas/io/parquet.py#L90-L181

So for operations that don't use parquet, it should not be necessary to use python-snappy. Note that pandas does support other compression methods like gzip as mentioned at https://pandas.pydata.org/pandas-docs/version/2.0/reference/api/pandas.DataFrame.to_parquet.html, though snappy compression is currently the default.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Ideally, python-snappy would release Python 3.11 compatible wheels at https://github.com/andrix/python-snappy/issues/124, but the last commit on that repo was 17 Mar 2022, so not likely to happen anytime soon.

Additional context

Add any other context or screenshots about the feature request here.

I noticed that there was a PR checking for Python 3.11 compatibility at https://github.com/holoviz/spatialpandas/pull/113, but in that case, python-snappy was installed from conda-forge (that does support Python 3.11 https://anaconda.org/conda-forge/python-snappy/files?version=0.6.1) rather than PyPI.

For historical context, snappy was added as a required dependency in 498e7fcd0f73132ffde472ada1fa22f3037205df/#60.

Happy to open a PR to make python-snappy optional if the above sounds good!

ianthomas23 commented 1 year ago

Happy to open a PR to make python-snappy optional if the above sounds good!

Hi @weiji14. Yes please!

It looks like from https://github.com/andrix/python-snappy/issues/124 that the recommendation is to replace use of python-snappy with cramjam instead. But that is long-term, at the moment I'd be happy if our CI passes without python-snappy.

weiji14 commented 1 year ago

It looks like from andrix/python-snappy#124 that the recommendation is to replace use of python-snappy with cramjam instead. But that is long-term, at the moment I'd be happy if our CI passes without python-snappy.

Cool, started a PR at #117. We could look into cramjam separately, it looks like a promising replacement built on Rust!