holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.24k stars 363 forks source link

Compatibility with geopandas 1.0 and dask-geopandas 0.4.0 #1347

Closed hoxbro closed 2 days ago

hoxbro commented 1 week ago

A minor compatibility change to support dask_geopandas.dask_expr.

Removed world example as that dataset is no longer available. Haven't looked at the surrounding text.

I also removed the numpy2 feature, as the original test now pulls it in.

codecov[bot] commented 1 week ago

Codecov Report

Attention: Patch coverage is 97.36842% with 1 line in your changes missing coverage. Please review.

Project coverage is 90.05%. Comparing base (73c3819) to head (0387dbb).

Files Patch % Lines
datashader/core.py 91.66% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #1347 +/- ## ========================================== - Coverage 90.22% 90.05% -0.17% ========================================== Files 92 92 Lines 18605 18626 +21 ========================================== - Hits 16786 16774 -12 - Misses 1819 1852 +33 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

droumis commented 1 week ago

started reviewing the notebook, but got stuck debugging the following:

ddf = dd.from_pandas(sgeodf, npartitions=2).pack_partitions(npartitions=100).persist()

tf.shade(cvs.polygons(ddf, geometry='geometry', agg=ds.mean('population')), cmap=cc.kg)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~/opt/miniconda3/envs/datashader-dev/lib/python3.12/site-packages/dask_expr/_core.py:467, in Expr.__getattr__(self, key)
    466 try:
--> 467     return object.__getattribute__(self, key)
    468 except AttributeError as err:

AttributeError: 'FromPandas' object has no attribute 'pack_partitions'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
File ~/opt/miniconda3/envs/datashader-dev/lib/python3.12/site-packages/dask_expr/_collection.py:619, in FrameBase.__getattr__(self, key)
    616 try:
    617     # Fall back to `expr` API
    618     # (Making sure to convert to/from Expr)
--> 619     val = getattr(self.expr, key)
    620     if callable(val):

File ~/opt/miniconda3/envs/datashader-dev/lib/python3.12/site-packages/dask_expr/_core.py:488, in Expr.__getattr__(self, key)
    487 link = "https://github.com/dask-contrib/dask-expr/blob/main/README.md#api-coverage"
--> 488 raise AttributeError(
    489     f"{err}\n\n"
    490     "This often means that you are attempting to use an unsupported "
    491     f"API function. Current API coverage is documented here: {link}."
    492 )

AttributeError: 'FromPandas' object has no attribute 'pack_partitions'

This often means that you are attempting to use an unsupported API function. Current API coverage is documented here: https://github.com/dask-contrib/dask-expr/blob/main/README.md#api-coverage.

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Cell In[36], line 1
----> 1 ddf = dd.from_pandas(sgeodf, npartitions=2).pack_partitions(npartitions=100).persist()
      3 tf.shade(cvs.polygons(ddf, geometry='geometry', agg=ds.mean('population')), cmap=cc.kg)

File ~/opt/miniconda3/envs/datashader-dev/lib/python3.12/site-packages/dask_expr/_collection.py:3049, in DataFrame.__getattr__(self, key)
   3046     raise err
   3047 except AttributeError:
   3048     # Fall back to `BaseFrame.__getattr__`
-> 3049     return super().__getattr__(key)

File ~/opt/miniconda3/envs/datashader-dev/lib/python3.12/site-packages/dask_expr/_collection.py:625, in FrameBase.__getattr__(self, key)
    622     return val
    623 except AttributeError:
    624     # Raise original error
--> 625     raise err

File ~/opt/miniconda3/envs/datashader-dev/lib/python3.12/site-packages/dask_expr/_collection.py:614, in FrameBase.__getattr__(self, key)
    611 def __getattr__(self, key):
    612     try:
    613         # Prioritize `FrameBase` attributes
--> 614         return object.__getattribute__(self, key)
    615     except AttributeError as err:
    616         try:
    617             # Fall back to `expr` API
    618             # (Making sure to convert to/from Expr)

AttributeError: 'DataFrame' object has no attribute 'pack_partitions'
hoxbro commented 1 week ago

started reviewing the notebook, but got stuck debugging the following:

Forgot to tell you, but spatialpandas does not work with dask-expr. So you have to disable it: dask config set dataframe.query-planning False or system environment variable export DASK_DATAFRAME__QUERY_PLANNING=False

droumis commented 6 days ago

@jbednar Does this match your understanding of interrelationships for Datashader Polygon world? diagram updated.. see later comments in thread

graph TD;
    subgraph SpatialPandas
        A[Polygon]:::spatialpandas -->|used in| E[PolygonArray]:::spatialpandas
        C[MultiPolygon]:::spatialpandas -->|used in| G[MultiPolygonArray]:::spatialpandas
        E -->|used in| F[GeoDataFrame]:::spatialpandas
        G -->|used in| F
        E -->|useable in| N[Pandas DataFrame with Geometry]:::pandas
        G -->|useable in| N
        A -->|converts to| B[shapely.geometry.Polygon]:::shapely
        C -->|converts to| D[shapely.geometry.MultiPolygon]:::shapely
        B -->|converts to| A
        D -->|converts to| C
    end

    subgraph GeoPandas
        H[GeoSeries]:::geopandas -->|converts to| I[spatialpandas.GeoSeries]:::spatialpandas
        H -->|converts to| J[GeometryArray]:::geopandas
    end

    subgraph Dask
        F -->|used in| L[DataFrame with Geometry]:::dask
        I -->|used in| L
    end

    subgraph Datashader
        M[Datashader]:::datashader
        F -->|usable by| M
        L -->|usable by| M
        J -->|usable by| M
        E -->|usable by| M
        G -->|usable by| M
        N -->|usable by| M
    end

    classDef spatialpandas fill:#7aa9e6,stroke:#000,stroke-width:1px;
    classDef shapely fill:#ff7f0e,stroke:#000,stroke-width:1px;
    classDef geopandas fill:#2ca02c,stroke:#000,stroke-width:1px;
    classDef dask fill:#d62728,stroke:#000,stroke-width:1px;
    classDef datashader fill:#9467bd,stroke:#000,stroke-width:1px;
    classDef pandas fill:#bcbd22,stroke:#000,stroke-width:1px;
droumis commented 6 days ago

Changed some of the text to make it make more sense (to me at least). I also replaced the dataset with one that shows something more recognizable (USA! USA!)

image

image

droumis commented 3 days ago

version 2: (NOT LATEST)

graph TD;
    subgraph SpatialPandas
        A[Polygon]:::spatialpandas -->|used in| E[PolygonArray]:::spatialpandas
        B[MultiPolygon]:::spatialpandas -->|used in| F[MultiPolygonArray]:::spatialpandas
        E -->|used in| G[GeoDataFrame or GeoSeries]:::spatialpandas
        F -->|used in| G
    end

    subgraph Shapely
        H[Polygon]:::shapely <-->|converts to| A
        I[MultiPolygon]:::shapely <-->|converts to| B
    end

    subgraph GeoPandas
        J[GeoDataFrame or GeoSeries]:::geopandas
    end

    subgraph Pandas
        K[DataFrame or Series]:::pandas
    end

    subgraph spatialpandas.dask
        L[DaskGeoDataFrame or DaskGeoSeries]:::dask
    end

    subgraph Datashader
        M{Datashader}:::datashader
    end

    F -->|usable in| K
    E -->|usable in| K

    G -->|converts to| L

    G <-->|converts to| J

    G -->|usable by| M
    L -->|usable by| M
    J -->|usable by| M
    K -->|usable by| M

    classDef spatialpandas fill:#4e79a7,stroke:#000,stroke-width:1px;
    classDef shapely fill:#f28e2b,stroke:#000,stroke-width:1px;
    classDef geopandas fill:#59a14f,stroke:#000,stroke-width:1px;
    classDef dask fill:#76b7b2,stroke:#000,stroke-width:1px;
    classDef datashader fill:#b07aa1,stroke:#000,stroke-width:1px;
    classDef pandas fill:#edc948,stroke:#000,stroke-width:1px
droumis commented 3 days ago

V3 .. fixing subgraph titles overlapping with arrows with some trickery

graph TD;
    subgraph SP[ ]
        A[Polygon]:::spatialpandas -->|used in| E[PolygonArray]:::spatialpandas
        B[MultiPolygon]:::spatialpandas -->|used in| F[MultiPolygonArray]:::spatialpandas
        E -->|used in| G[GeoDataFrame or<div></div> GeoSeries]:::spatialpandas
        spatialpandas(SpatialPandas):::spatialpandas_title
        F -->|used in| G
    end

    subgraph SH[ ]
        shapely(Shapely):::shapely_title
        H[Polygon]:::shapely -->|converts to| A
        I[MultiPolygon]:::shapely -->|converts to| B
    end

    subgraph GP[ ]
        geopandas(GeoPandas):::geopandas_title
        J[GeoDataFrame or<div></div> GeoSeries]:::geopandas
    end

    subgraph PD[ ]
        pandas(Pandas):::pandas_title
        K[DataFrame or<div></div> Series]:::pandas
    end

    subgraph SPD[ ]
        spatialpandas.dask(SpatialPandas.Dask):::dask_title
        L[DaskGeoDataFrame or<div></div> DaskGeoSeries]:::dask
    end

    M(Datashader):::datashader_title

    F -->|usable in| K
    E -->|usable in| K

    G -->|converts to| L

    G <-->|converts to| J

    G -->|usable by| M
    L -->|usable by| M
    J -->|usable by| M
    K -->|usable by| M

    classDef spatialpandas fill:#4e79a7,stroke:#000,stroke-width:0px,color:black;
    classDef shapely fill:#f28e2b,stroke:#000,stroke-width:0px,color:black;
    classDef geopandas fill:#59a14f,stroke:#000,stroke-width:0px,color:black;
    classDef dask fill:#76b7b2,stroke:#000,stroke-width:0px,color:black;
    classDef datashader fill:#b07aa1,stroke:#000,stroke-width:0px,color:black;
    classDef pandas fill:#edc948,stroke:#000,stroke-width:0px,color:black;
    classDef spatialpandas_title fill:#fff,stroke:#4e79a7,stroke-width:7px,color:black,fill-opacity:.9,font-weight: bold;
    classDef shapely_title fill:#fff,stroke:#f28e2b,stroke-width:7px,color:black,fill-opacity:.9,font-weight: bold;
    classDef geopandas_title fill:#fff,stroke:#59a14f,stroke-width:7px,color:black,fill-opacity:.9,font-weight: bold;
    classDef dask_title fill:#fff,stroke:#76b7b2,stroke-width:7px,color:black,fill-opacity:.9,font-weight: bold;
    classDef pandas_title fill:#fff,stroke:#edc948,stroke-width:7px,color:black,fill-opacity:.9,font-weight: bold;
    classDef datashader_title fill:#fff,stroke:#b07aa1,stroke-width:7px,color:black,fill-opacity:.9,font-weight: bold;
    classDef subgraph_style fill:#fff;

    style SP fill:grey,stroke:#fff,stroke-width:0px
    style SH fill:grey,stroke:#fff,stroke-width:0px
    style GP fill:grey,stroke:#fff,stroke-width:0px
    style PD fill:grey,stroke:#fff,stroke-width:0px
    style SPD fill:grey,stroke:#fff,stroke-width:0px
jbednar commented 3 days ago

@droumis , that looks good! Are the double-headed arrows that point in to the Shapely box intentional? I would have thought that Datashader has no interest in converting things to Shapely.

droumis commented 3 days ago

They were intentional because the notebook mentions the to_shapely method. But I agree that it's a bit of a distraction so I've now removed those lines going back toward shapely.

droumis commented 3 days ago

@hoxbro, I think we can merge now