holoviz / spatialpandas

Pandas extension arrays for spatial/geometric operations
BSD 2-Clause "Simplified" License
308 stars 24 forks source link

Add compatibility for Dask 2021.06.0 #70

Closed jrbourbeau closed 3 years ago

jrbourbeau commented 3 years ago

This PR adds some compatibility updates for the dask=2021.06.0 release. xref https://github.com/dask/dask/pull/7743#issuecomment-853693448

cc @jbednar

Supersedes https://github.com/holoviz/spatialpandas/pull/69, xref https://github.com/holoviz/spatialpandas/issues/68#issuecomment-855285094

jbednar commented 3 years ago

Thanks! Since you're a core Dask team member, I'll take this suggestion as the authoritative response for what to do for dask <=2021.5.0 and >=2021.6.0.

jrbourbeau commented 3 years ago

Yeah, sorry about the churn between the 2021.5.0 and 2021.6.0 releases. The changes here should provide compatibility for all but 2021.5.1 (it sounds like you're okay saying that's just a bad release).

From what I can tell, it looks like the CI failures here are unrelated to the Dask changes. Is that the case?

jbednar commented 3 years ago

Thanks. Yes, I'm fine with ignoring 2021.05.1. I agree that the test failures seem unrelated, but I have no idea what is causing them. Something to do with pyarrow, numpy or pandas, presumably?

philippjfr commented 3 years ago

Test failures are likely due to an old version of pyarrow getting installed. I'll try to fix.

philippjfr commented 3 years ago

Seems like conda can't solve for geopandas and a recent pyarrow version when pointed only at defaults so I've had to add conda-forge.

jrbourbeau commented 3 years ago

Thanks for the updates @philippjfr! For the remaining

ImportError: libpoppler.so.71: cannot open shared object file: No such file or directory

issue, there are a couple of upstream geopandas issue (xref https://github.com/geopandas/geopandas/issues/1109, https://github.com/geopandas/geopandas/issues/924) which look similar. In those other issues, setting a strict conda channel priority helped to get a more stable environment. I'm not familiar with all the commands being used to set up the CI environment (e.g. doit env_create) -- is there a way we can set the conda channel priority to strict in CI to see if that helps?

jrbourbeau commented 3 years ago

Thanks for the updates @philippjfr! It looks like there are some Python version inconsistencies in CI builds now. I've made a first attempt at resolving those issues, though since I've not committed to this repo before I need approval for CI builds to run

jrbourbeau commented 3 years ago

At this point all builds are passing other than the Python 3.7 macOS build, which is failing for some unknown reason. At this point I'm mostly mucking around in CI configuration that's not immediately related to the original dask=2021.06.0 compatibility changes. If a spatialpandas maintainer has some bandwidth, feel free to push directly to this branch.

philippjfr commented 3 years ago

Yeah, it's really not for you to worry about at this point. It just seems very difficult to get a working environment which includes geopandas (and therefore libgdal) and recent pyarrow, which works across platforms. Will maybe just disable the OSX test for now and ping the conda team about these issues again.

jbednar commented 3 years ago

Would spatialpandas tests be able to use a "geopandas-core" package that didn't depend on GDAL or libgdal? At one point @jorisvandenbossche indicated that "we could easily look into making fiona/gdal optional for geopandas, as it is not a hard requirement". I'd love to be able to use geopandas in situations like this without having the dependency issues of GDAL. I know this wouldn't be an immediate solution, but for the long term...

jorisvandenbossche commented 3 years ago

The latest version of GeoPandas already doesn't hard-depend on fiona (as recently mentioned in https://github.com/holoviz/spatialpandas/issues/1#issuecomment-841198108), so you could install it with pip (just with --no-deps, since for user convenience we kept fiona as "install_requires" dependency, but geopandas will run without)

And for conda, I actually just looked into making a geopandas-base conda-forge package earlier this week: https://github.com/conda-forge/geopandas-feedstock/pull/90

jorisvandenbossche commented 3 years ago

A first user of geopandas-base! ;) (it's actually good to get feedback on this if this works)

jorisvandenbossche commented 3 years ago

I assume you were a bit too fast. I only just merged the change right before, so it takes a bit before it is present on the conda channels. If you restart CI now, it should probably be better.

jrbourbeau commented 3 years ago

The Python 3.7 on macOS are running 🎉

philippjfr commented 3 years ago

Well that came just at the right time, thanks @jorisvandenbossche! And @jrbourbeau, sorry you had to deal with the bottomless pit of tears and despair that comes with dealing with any environment with a dependency on libgdal. Will merge and then tag a dev release.

jrbourbeau commented 3 years ago

No worries! I've got a couple of other small CI cleanup things I want to test out, but let's save that for a follow-up PR : )

jorisvandenbossche commented 3 years ago

dealing with any environment with a dependency on libgdal

BTW, if you use strict channel priority, in my experience this doesn't give that many tears (+ avoiding the default channel if you need recent versions)

philippjfr commented 3 years ago

In my experience that's true for most versions but some combinations of OSX and certain Python versions have still given me headaches recently.

philippjfr commented 3 years ago

Thanks again everyone, spatialpandas 0.4.1 is now on the conda pyviz channel and PyPI and in due course should be up on conda-forge as well.

jbednar commented 3 years ago

Yes, thanks so much, everyone! Truly a team effort! Now I can get back to the Datashader release that this was blocking.