holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.24k stars 363 forks source link

Visual artifacts when shading Polygons #1327

Closed philipc2 closed 1 month ago

philipc2 commented 2 months ago

I have a spatialpandas.GeoDataFrame with ~600k Polygons that I am attempting to render. There are no holes or gaps between any of the polygons.

image
import datashader as ds
import datashader.transfer_functions as tf

cvs = ds.Canvas(plot_width=300, plot_height=300)
agg = cvs.polygons(gdf, geometry='geometry', agg=ds.mean('temperature_500hPa'))
tf.shade(agg)
image

The same issue continues when using HoloViews (which is what I started with initially. However, using different pixel_ratio values seems to change the behavior of these artifacts.

hv_polygons = hv.Polygons(gdf, vdims=['temperature_500hPa'])
(hv.operation.datashader.rasterize(hv_polygons, dynamic=False, pixel_ratio=1.0) + 
 hv.operation.datashader.rasterize(hv_polygons, dynamic=False, pixel_ratio=2.0) + 
 hv.operation.datashader.rasterize(hv_polygons, dynamic=False, pixel_ratio=4.0) + 
 hv.operation.datashader.rasterize(hv_polygons, dynamic=False, pixel_ratio=8.0)).cols(2)
image

I have tried the following to attempt to isolate the issues, but it did not change anything

Version Info

philipc2 commented 2 months ago

I can share the data upon request. Is there a preferred format for sharing a GeoDataFrame with Polygons. I assume a CSV should be good?

jbednar commented 2 months ago

Thanks for the report. Sounds like you're sure it's not due to rounding issues in the original data? E.g. if you zoom in on one of the areas with gaps, does the gap shrink or disappear? If so it's presumably an issue with the rendering code rather than the data. And does the same data render without gaps in other tools, e.g. native Matplotlib?

A Parquet-based file is usually more practical than CSV, but anything that you can provide end-to-end reproducible code for reading and displaying it should be fine.

philipc2 commented 2 months ago

Thanks for the report. Sounds like you're sure it's not due to rounding issues in the original data? E.g. if you zoom in on one of the areas with gaps, does the gap shrink or disappear? If so it's presumably an issue with the rendering code rather than the data. And does the same data render without gaps in other tools, e.g. native Matplotlib?

Yeah, there's no NaN values in my original dataset.

image

image

I took a subset of the region surrounding a cluster of the NaN values and plotted the vector polygons to show what the "true" mesh should look like.

image

philipc2 commented 2 months ago

Here's the polygons as a parquet file.

https://drive.google.com/file/d/12YQmrJUAZBqpAwOrFLL-urIyWbR9U_SI/view?usp=sharing

jbednar commented 2 months ago

Can you include the complete code to go from that file to the rendered image? Thanks.

philipc2 commented 2 months ago

Can you include the complete code to go from that file to the rendered image? Thanks.

Sure! Here's the code that I used (with the produced output)

import datashader as ds
import datashader.transfer_functions as tf
import geopandas as gp

gdf = gp.read_parquet("out.parquet")

cvs = ds.Canvas(plot_width=300, plot_height=300)
agg = cvs.polygons(gdf, geometry='geometry', agg=ds.mean('temperature_500hPa'))
tf.shade(agg)

image

jbednar commented 1 month ago

Ok, I can reproduce the issue, and I agree it doesn't seem to be caused by gaps in the original data due to things like truncating the coordinate precision in a CSV export. The gaps at least sometimes shrink or disappear on zooming in, while I think if the gaps were in the data they would get larger on zooming in:

import datashader as ds, geopandas
gdf = geopandas.read_parquet("Downloads/out.parquet")

def plot(x_range=[-135,-65], y_range=[21,59]):
    cvs = ds.Canvas(plot_width=300, plot_height=300, x_range=x_range, y_range=y_range)
    agg = cvs.polygons(gdf, geometry='geometry', agg=ds.mean('temperature_500hPa'))
    return ds.tf.shade(agg, name=str((x_range,y_range)))

ds.tf.Images(plot((-111,-100), (34,45)), 
             plot((-110,-100), (35,45)),
             plot((-109,-100), (36,45)),
             plot((-109,-101), (36,44)))
image

Plus if I go all the way down to the polygons, I can't see any gaps appear:

ds.tf.Images(plot((-111,-110.5),     (34,34.5)), 
             plot((-111,-110.75),    (34,34.25)),
             plot((-111,-110.9),     (34,34.1)),
             plot((-110.98,-110.96), (34.03,34.05)))
image

So I think Datashader is losing precision somewhere in the polygon rendering code. Unfortunately this could be quite tricky to debug! Anyone got ideas?

philipc2 commented 1 month ago

Thanks for getting this fixed so quickly!