Using Geopandas with hvPlot doesn't work for a large dataset.

Azaya89 commented 6 months ago

ALL software version info

MacOS - Sonoma 14.0
python = 3.11.8
notebook = 7.0.8
dask = 2023.11.0
datashader = 0.16.0
geopandas = 0.14.2
hvplot = 0.9.2
holoviews = 1.18.3
spatialpandas = 0.4.10
pyarrow = 14.0.2
pandas = 2.2.1
bokeh = 3.4.0

Description of expected behavior and the observed behavior

As part of the NumFOCUS SDG, I am modernizing the nyc_building example on the examples website to use the latest APIs. This example also involves using geopandas instead of spatialpandas to read the data stored as a parquet (.parq) file.

Switching from spatialpandas to geopandas for reading the file was not straightforward. The geometry column in the data file was not recognized by geopandas. To address this, I read the file using spatialpandas, converted it to a pandas DataFrame using .compute(), and then transformed it into a geopandas DataFrame. The geometry column was converted to a shapely object using a custom function. Finally, I saved this new DataFrame as a .parq file, which was then read directly using geopandas.

Code:

import geopandas as gpd
import spatialpandas as spd
import spatialpandas.io

ddf = spd.io.read_parquet_dask('./data/nyc_buildings.parq').persist().compute()

gdf = gpd.GeoDataFrame(ddf)

def convert_to_shapely(spgeom):
    try:
        return spgeom.to_shapely()
    except Exception as e:
        print(f"Error converting geometry: {e}")
        return None

gdf['geometry'] = gdf['geometry'].apply(convert_to_shapely)
gdf = gpd.GeoDataFrame(gdf, geometry='geometry')

gdf.to_parquet('new_nyc_buildings.parq')

gdf = gpd.read_parquet('new_nyc_buildings.parq')

gdf.head()

Despite these adjustments, plotting the new data file proved challenging. Using hvPlot.polygons to plot the entire dataset takes over 5 minutes to run with minimal code:

gdf.hvplot.polygons(tiles='CartoLight', rasterize=True)

Testing with a small sample of the data (gdf.head(1000)) produced results in a reasonable time, suggesting that hvPlot may have difficulties handling the large dataset (over 1 million rows).

Additionally, plotting with extra parameters, such as adding color mapping to the type categories in the data results in a Traceback error.

Complete, Minimal, Self-Contained Example Code that Reproduces the Issue

gdf.head(1000).hvplot.polygons(tiles='CartoLight', rasterize=True, c='type', cmap=color_key)

Stack traceback and/or browser JavaScript console output

WARNING:param.dynamic_operation: Callable raised "ValueError('input must be numeric')".
Invoked as dynamic_operation(height=300, scale=1.0, width=700, x_range=None, y_range=None)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/IPython/core/formatters.py:974, in MimeBundleFormatter.__call__(self, obj, include, exclude)
    971     method = get_real_method(obj, self.print_method)
    973     if method is not None:
--> 974         return method(include=include, exclude=exclude)
    975     return None
    976 else:

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/dimension.py:1286, in Dimensioned._repr_mimebundle_(self, include, exclude)
   1279 def _repr_mimebundle_(self, include=None, exclude=None):
   1280     """
   1281     Resolves the class hierarchy for the class rendering the
   1282     object using any display hooks registered on Store.display
   1283     hooks.  The output of all registered display_hooks is then
   1284     combined and returned.
   1285     """
-> 1286     return Store.render(self)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/options.py:1428, in Store.render(cls, obj)
   1426 data, metadata = {}, {}
   1427 for hook in hooks:
-> 1428     ret = hook(obj)
   1429     if ret is None:
   1430         continue

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/ipython/display_hooks.py:287, in pprint_display(obj)
    285 if not ip.display_formatter.formatters['text/plain'].pprint:
    286     return None
--> 287 return display(obj, raw_output=True)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/ipython/display_hooks.py:261, in display(obj, raw_output, **kwargs)
    259 elif isinstance(obj, (HoloMap, DynamicMap)):
    260     with option_state(obj):
--> 261         output = map_display(obj)
    262 elif isinstance(obj, Plot):
    263     output = render(obj)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/ipython/display_hooks.py:149, in display_hook.<locals>.wrapped(element)
    147 try:
    148     max_frames = OutputSettings.options['max_frames']
--> 149     mimebundle = fn(element, max_frames=max_frames)
    150     if mimebundle is None:
    151         return {}, {}

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/ipython/display_hooks.py:209, in map_display(vmap, max_frames)
    206     max_frame_warning(max_frames)
    207     return None
--> 209 return render(vmap)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/ipython/display_hooks.py:76, in render(obj, **kwargs)
     73 if renderer.fig == 'pdf':
     74     renderer = renderer.instance(fig='png')
---> 76 return renderer.components(obj, **kwargs)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/plotting/renderer.py:396, in Renderer.components(self, obj, fmt, comm, **kwargs)
    393 embed = (not (dynamic or streams or self.widget_mode == 'live') or config.embed)
    395 if embed or config.comms == 'default':
--> 396     return self._render_panel(plot, embed, comm)
    397 return self._render_ipywidget(plot)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/plotting/renderer.py:403, in Renderer._render_panel(self, plot, embed, comm)
    401 doc = Document()
    402 with config.set(embed=embed):
--> 403     model = plot.layout._render_model(doc, comm)
    404 if embed:
    405     return render_model(model, comm)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/panel/viewable.py:736, in Viewable._render_model(self, doc, comm)
    734 if comm is None:
    735     comm = state._comm_manager.get_server_comm()
--> 736 model = self.get_root(doc, comm)
    738 if self._design and self._design.theme.bokeh_theme:
    739     doc.theme = self._design.theme.bokeh_theme

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/panel/layout/base.py:320, in Panel.get_root(self, doc, comm, preprocess)
    316 def get_root(
    317     self, doc: Optional[Document] = None, comm: Optional[Comm] = None,
    318     preprocess: bool = True
    319 ) -> Model:
--> 320     root = super().get_root(doc, comm, preprocess)
    321     # ALERT: Find a better way to handle this
    322     if hasattr(root, 'styles') and 'overflow-x' in root.styles:

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/panel/viewable.py:667, in Renderable.get_root(self, doc, comm, preprocess)
    665 wrapper = self._design._wrapper(self)
    666 if wrapper is self:
--> 667     root = self._get_model(doc, comm=comm)
    668     if preprocess:
    669         self._preprocess(root)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/panel/layout/base.py:186, in Panel._get_model(self, doc, root, parent, comm)
    184 root = root or model
    185 self._models[root.ref['id']] = (model, parent)
--> 186 objects, _ = self._get_objects(model, [], doc, root, comm)
    187 props = self._get_properties(doc)
    188 props[self._property_mapping['objects']] = objects

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/panel/layout/base.py:168, in Panel._get_objects(self, model, old_objects, doc, root, comm)
    166 else:
    167     try:
--> 168         child = pane._get_model(doc, root, model, comm)
    169     except RerenderError as e:
    170         if e.layout is not None and e.layout is not self:

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/panel/pane/holoviews.py:429, in HoloViews._get_model(self, doc, root, parent, comm)
    427     plot = self.object
    428 else:
--> 429     plot = self._render(doc, comm, root)
    431 plot.pane = self
    432 backend = plot.renderer.backend

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/panel/pane/holoviews.py:525, in HoloViews._render(self, doc, comm, root)
    522     if comm:
    523         kwargs['comm'] = comm
--> 525 return renderer.get_plot(self.object, **kwargs)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/plotting/bokeh/renderer.py:68, in BokehRenderer.get_plot(self_or_cls, obj, doc, renderer, **kwargs)
     61 @bothmethod
     62 def get_plot(self_or_cls, obj, doc=None, renderer=None, **kwargs):
     63     """
     64     Given a HoloViews Viewable return a corresponding plot instance.
     65     Allows supplying a document attach the plot to, useful when
     66     combining the bokeh model with another plot.
     67     """
---> 68     plot = super().get_plot(obj, doc, renderer, **kwargs)
     69     if plot.document is None:
     70         plot.document = Document() if self_or_cls.notebook_context else curdoc()

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/plotting/renderer.py:217, in Renderer.get_plot(self_or_cls, obj, doc, renderer, comm, **kwargs)
    214     raise SkipRendering(msg.format(dims=dims))
    216 # Initialize DynamicMaps with first data item
--> 217 initialize_dynamic(obj)
    219 if not renderer:
    220     renderer = self_or_cls

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/plotting/util.py:270, in initialize_dynamic(obj)
    268     continue
    269 if not len(dmap):
--> 270     dmap[dmap._initial_key()]

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/spaces.py:1217, in DynamicMap.__getitem__(self, key)
   1215 # Not a cross product and nothing cached so compute element.
   1216 if cache is not None: return cache
-> 1217 val = self._execute_callback(*tuple_key)
   1218 if data_slice:
   1219     val = self._dataslice(val, data_slice)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/spaces.py:984, in DynamicMap._execute_callback(self, *args)
    981     kwargs['_memoization_hash_'] = hash_items
    983 with dynamicmap_memoization(self.callback, self.streams):
--> 984     retval = self.callback(*args, **kwargs)
    985 return self._style(retval)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/spaces.py:552, in Callable.__call__(self, *args, **kwargs)
    550     return self.callable.rx.value
    551 elif not args and not kwargs and not any(kwarg_hash):
--> 552     return self.callable()
    553 inputs = [i for i in self.inputs if isinstance(i, DynamicMap)]
    554 streams = []

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/util/__init__.py:1037, in Dynamic._dynamic_operation.<locals>.dynamic_operation(*key, **kwargs)
   1036 def dynamic_operation(*key, **kwargs):
-> 1037     key, obj = resolve(key, kwargs)
   1038     return apply(obj, *key, **kwargs)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/util/__init__.py:1026, in Dynamic._dynamic_operation.<locals>.resolve(key, kwargs)
   1024 elif isinstance(map_obj, DynamicMap) and map_obj._posarg_keys and not key:
   1025     key = tuple(kwargs[k] for k in map_obj._posarg_keys)
-> 1026 return key, map_obj[key]

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/spaces.py:1217, in DynamicMap.__getitem__(self, key)
   1215 # Not a cross product and nothing cached so compute element.
   1216 if cache is not None: return cache
-> 1217 val = self._execute_callback(*tuple_key)
   1218 if data_slice:
   1219     val = self._dataslice(val, data_slice)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/spaces.py:984, in DynamicMap._execute_callback(self, *args)
    981     kwargs['_memoization_hash_'] = hash_items
    983 with dynamicmap_memoization(self.callback, self.streams):
--> 984     retval = self.callback(*args, **kwargs)
    985 return self._style(retval)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/spaces.py:552, in Callable.__call__(self, *args, **kwargs)
    550     return self.callable.rx.value
    551 elif not args and not kwargs and not any(kwarg_hash):
--> 552     return self.callable()
    553 inputs = [i for i in self.inputs if isinstance(i, DynamicMap)]
    554 streams = []

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/overlay.py:34, in Overlayable.__mul__.<locals>.dynamic_mul(*args, **kwargs)
     33 def dynamic_mul(*args, **kwargs):
---> 34     element = other[args]
     35     return self * element

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/spaces.py:1217, in DynamicMap.__getitem__(self, key)
   1215 # Not a cross product and nothing cached so compute element.
   1216 if cache is not None: return cache
-> 1217 val = self._execute_callback(*tuple_key)
   1218 if data_slice:
   1219     val = self._dataslice(val, data_slice)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/spaces.py:984, in DynamicMap._execute_callback(self, *args)
    981     kwargs['_memoization_hash_'] = hash_items
    983 with dynamicmap_memoization(self.callback, self.streams):
--> 984     retval = self.callback(*args, **kwargs)
    985 return self._style(retval)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/spaces.py:582, in Callable.__call__(self, *args, **kwargs)
    579     args, kwargs = (), dict(pos_kwargs, **kwargs)
    581 try:
--> 582     ret = self.callable(*args, **kwargs)
    583 except KeyError:
    584     # KeyError is caught separately because it is used to signal
    585     # invalid keys on DynamicMap and should not warn
    586     raise

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/util/__init__.py:1038, in Dynamic._dynamic_operation.<locals>.dynamic_operation(*key, **kwargs)
   1036 def dynamic_operation(*key, **kwargs):
   1037     key, obj = resolve(key, kwargs)
-> 1038     return apply(obj, *key, **kwargs)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/util/__init__.py:1030, in Dynamic._dynamic_operation.<locals>.apply(element, *key, **kwargs)
   1028 def apply(element, *key, **kwargs):
   1029     kwargs = dict(util.resolve_dependent_kwargs(self.p.kwargs), **kwargs)
-> 1030     processed = self._process(element, key, kwargs)
   1031     if (self.p.link_dataset and isinstance(element, Dataset) and
   1032         isinstance(processed, Dataset) and processed._dataset is None):
   1033         processed._dataset = element.dataset

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/util/__init__.py:1012, in Dynamic._process(self, element, key, kwargs)
   1010 elif isinstance(self.p.operation, Operation):
   1011     kwargs = {k: v for k, v in kwargs.items() if k in self.p.operation.param}
-> 1012     return self.p.operation.process_element(element, key, **kwargs)
   1013 else:
   1014     return self.p.operation(element, **kwargs)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/operation.py:194, in Operation.process_element(self, element, key, **params)
    191 else:
    192     self.p = param.ParamOverrides(self, params,
    193                                   allow_extra_keywords=self._allow_extra_keywords)
--> 194 return self._apply(element, key)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/operation.py:141, in Operation._apply(self, element, key)
    139     if not in_method:
    140         element._in_method = True
--> 141 ret = self._process(element, key)
    142 if hasattr(element, '_in_method') and not in_method:
    143     element._in_method = in_method

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/operation/datashader.py:1506, in rasterize._process(self, element, key)
   1503     op = transform.instance(**{k:v for k,v in extended_kws.items()
   1504                                if k in transform.param})
   1505     op._precomputed = self._precomputed
-> 1506     element = element.map(op, predicate)
   1507     self._precomputed = op._precomputed
   1509 unused_params = list(all_supplied_kws - all_allowed_kws)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/data/__init__.py:196, in PipelineMeta.pipelined.<locals>.pipelined_fn(*args, **kwargs)
    193     inst._in_method = True
    195 try:
--> 196     result = method_fn(*args, **kwargs)
    197     if PipelineMeta.disable:
    198         return result

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/data/__init__.py:1213, in Dataset.map(self, *args, **kwargs)
   1211 @wraps(LabelledData.map)
   1212 def map(self, *args, **kwargs):
-> 1213     return super().map(*args, **kwargs)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/dimension.py:697, in LabelledData.map(self, map_fn, specs, clone)
    695     return deep_mapped
    696 else:
--> 697     return map_fn(self) if applies else self

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/operation.py:214, in Operation.__call__(self, element, **kwargs)
    210         return element.clone([(k, self._apply(el, key=k))
    211                               for k, el in element.items()])
    212     elif ((self._per_element and isinstance(element, Element)) or
    213           (not self._per_element and isinstance(element, ViewableElement))):
--> 214         return self._apply(element)
    215 elif 'streams' not in kwargs:
    216     kwargs['streams'] = self.p.streams

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/core/operation.py:141, in Operation._apply(self, element, key)
    139     if not in_method:
    140         element._in_method = True
--> 141 ret = self._process(element, key)
    142 if hasattr(element, '_in_method') and not in_method:
    143     element._in_method = in_method

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/holoviews/operation/datashader.py:1391, in geometry_rasterize._process(self, element, key)
   1389 agg_kwargs = dict(geometry=col, agg=agg_fn)
   1390 if isinstance(element, Polygons):
-> 1391     agg = cvs.polygons(data, **agg_kwargs)
   1392 elif isinstance(element, Path):
   1393     if self.p.line_width and ds_version >= Version('0.14.0'):

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/datashader/core.py:782, in Canvas.polygons(self, source, geometry, agg)
    780 if agg is None:
    781     agg = any_rdn()
--> 782 return bypixel(source, self, glyph, agg)

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/datashader/core.py:1329, in bypixel(source, canvas, glyph, agg, antialias)
   1327 schema = dshape.measure
   1328 glyph.validate(schema)
-> 1329 agg.validate(schema)
   1330 canvas.validate()
   1332 # All-NaN objects (e.g. chunks of arrays with no data) are valid in Datashader

File ~/Documents/development/holoviz-topics-examples/nyc_buildings/envs/default/lib/python3.11/site-packages/datashader/reductions.py:355, in Reduction.validate(self, in_dshape)
    353     raise ValueError("specified column not found")
    354 if not isnumeric(in_dshape.measure[self.column]):
--> 355     raise ValueError("input must be numeric")

ValueError: input must be numeric
:DynamicMap   []

Screenshots or screencasts of the bug in action

maximlt commented 6 months ago

@Azaya89 thanks for this very nice bug report!

Could you also report the timings for the equivalent of this line gdf.hvplot.polygons(tiles='CartoLight', rasterize=True) but with SpatialPandas? I would like to know how much slower things got.

Azaya89 commented 6 months ago

Here it is.

maximlt commented 6 months ago

Oh I'm pretty sure it definitely takes more for the plot to render than 542 us. Can you get an estimate of the real time it takes for the plot to render? Also, I notice another difference is that the hvPlot call uses rasterize while in the last snippet it uses datashade.

Azaya89 commented 6 months ago

Oh I'm pretty sure it definitely takes more for the plot to render than 542 us. Can you get an estimate of the real time it takes for the plot to render?

How do you mean? is it different from using timeit on the plot objects?

droumis commented 6 months ago

How do you mean? is it different from using timeit on the plot objects?

Yes, the render time is different. Feel free to just give a rough estimate of when the plot is done rendering on screen

Azaya89 commented 6 months ago

Yes, the render time is different. Feel free to just give a rough estimate of when the plot is done rendering on screen

OK, I timed it myself and it took ~5 secs to run this cell and render the plots: tiles * shaded * legend * hover

hoxbro commented 6 months ago

Can you try to run this with HoloViews 1.19.0a2?

Azaya89 commented 6 months ago

Can you try to run this with HoloViews 1.19.0a2?

I did and it ran significantly faster. However, adding the other parameters still caused the same error as before.

maximlt commented 5 months ago

Coming back to the issue reported with:

gdf.head(1000).hvplot.polygons(tiles='CartoLight', rasterize=True, c='type', cmap=color_key)

@Azaya89 you haven't shared how color_key is derived in your example. Focusing on this bit of code from the NYC Buildings example:

cats = list(ddf.type.value_counts().compute().iloc[:10].index.values) + ['unknown']
ddf['type'] = ddf.type.replace({None: 'unknown'})
ddf = ddf[ddf.type.isin(cats)]
ddf['type'] = ddf['type'].astype('category').cat.as_known()

There, ddf is a spatialpandas.geodataframe.GeoDataFrame object. In your updated code, gdf is a geopandas.GeoDataFrame object. The latter has a type property that returns the geometry type of each geometry in the GeoSeries.

https://github.com/geopandas/geopandas/blob/747d66ee6fcf00b819c08f11ecded53736c4652b/geopandas/base.py#L233-L236

Therefore to access the column data you need to use __getitem__ / the [] syntax with gdf['type'].

Then, to get the plot displayed I had to add aggregator='count_cat' to c='type'. Somehow, I would have expected setting these two parameters to be identical to setting by='type' but it didn't work. Something to discuss I guess.

So here's the full code:

Code

```python import colorcet as cc import datashader as ds import geopandas as gpd import hvplot.pandas gdf = gpd.read_parquet('new_nyc_buildings.parq') cats = list(gdf['type'].value_counts().iloc[:10].index.values) + ['unknown'] gdf['type'] = gdf['type'].replace({None: 'unknown'}) gdf = gdf[gdf['type'].isin(cats)] colors = cc.glasbey_bw_minc_20_maxl_70 color_key = {cat: tuple(int(e*255.) for e in colors[i]) for i, cat in enumerate(cats)} gdf.hvplot.polygons( tiles='CartoLight', data_aspect=1, datashade=True, aggregator=ds.by('type'), cmap=color_key ) ```

Alternatively, this would also work:

gdf.hvplot.polygons(
    tiles='CartoLight', data_aspect=1,
    datashade=True, aggregator='count_cat', c='type', cmap=color_key
)

In the NYC Buildings example another aggregator is used with ds.by('type', ds.any()). If I try to use that instead of ds.by('type') I get ValueError: input must be categorical. Turning the type column into a categorical one fixes that, and the plot looks much closer to the plot currently displayed in the example:

Code

```python gdf['type'] = gdf['type'].astype('category') gdf.hvplot.polygons( tiles='CartoLight', data_aspect=1, datashade=True, aggregator=ds.by('type', ds.any()), cmap=color_key ) ```

Finally, setting rasterize=True instead of datashade=True generates a plot that is far from what we'd expect:

And indeed, taking another simpler example and using HoloViews only, I can see that the output of a rasterize operation is not the expected one.

Code

```python import geopandas as gpd import geodatasets import datashader as ds import holoviews as hv import spatialpandas as spd from holoviews.operation.datashader import datashade, rasterize hv.extension('bokeh') path = geodatasets.get_path("geoda.nyc_neighborhoods") nyc = gpd.read_file(path) nyc['boroname'] = nyc['boroname'].astype('category') # required spd_nyc = spd.GeoDataFrame(nyc) polys = hv.Polygons(spd_nyc, vdims='boroname') shaded = datashade(polys, aggregator=ds.by('boroname', ds.any())) rasterized = rasterize(polys, aggregator=ds.by('boroname', ds.any())) polys + shaded + rasterized ```

Azaya89 commented 5 months ago

@Azaya89 you haven't shared how color_key is derived in your example.

Here's how it was constructed:

colors    = cc.glasbey_bw_minc_20_maxl_70
color_key = {cat: tuple(int(e*255.) for e in colors[i]) for i, cat in enumerate(cats)}

The same as the one you shared.

Thank you for this. I have incorporated it into my PR now, although I'm now curious why rasterize and datashade are giving very different outputs...

maximlt commented 5 months ago

@Azaya89 you haven't shared how color_key is derived in your example.

Here's how it was constructed:
colors    = cc.glasbey_bw_minc_20_maxl_70
color_key = {cat: tuple(int(e*255.) for e in colors[i]) for i, cat in enumerate(cats)}
The same as the one you shared.

Thank you for this. I have incorporated it into my PR now, although I'm now curious why rasterize and datashade are giving very different outputs...

@Azaya89 what was important is how you computed cats.

maximlt commented 5 months ago

although I'm now curious why rasterize and datashade are giving very different outputs...

This is a bug in HoloViews I think, I need to open a bug report.

Azaya89 commented 5 months ago

@Azaya89 what was important is how you computed cats.

OK. Here:

cats = list(gdf['type'].value_counts().iloc[:10].index) + ['unknown']

holoviz / hvplot