opendatacube / datacube-core

Open Data Cube analyses continental scale Earth Observation data through time
http://www.opendatacube.org
Apache License 2.0
505 stars 176 forks source link

Distorted data along fractional Dask chunks when loading data with `dc.load` and Dask #1448

Closed robbibt closed 1 year ago

robbibt commented 1 year ago

Expected behaviour

When loading data with Open Data Cube, we expect to be able to use Dask to efficiently parallelise workflows and reduce peak memory usage. Part of using Dask is an implicit assumption that data loaded by Dask will be identical to data loaded without Dask.

Actual behaviour

There appears to be a bug in datacube.load that is causing data loaded with Dask to be distorted along edges of Dask chunks. We first noticed data being blurred/stretched along the right side of our data loads when loaded with Dask - see animation below which shows the effect along the right side of the image:

dask_vs_nodask

Looking into this deeper, it appears that if a non-"nearest" resampling method is used (e.g. "cubic", "bilinear") and a Dask chunk extends off the right side of a data load, the data from that chunk will be strongly distorted. For example, if dask_chunks={"time": 1, "x": 2048, "y": 2048} and our data load is 2100 pixels wide, the right-most 52 pixels will be affected. If Dask is not used, dc.load works perfectly.

Steps to reproduce the behaviour

We've put together a Jupyter Notebook with a reproducible example here - it includes several examples of the issue, including some visualisations of how much this affects loaded data: https://gist.github.com/robbibt/4d6905d5d13137bb55362c4a6f1c18b8

For example, here is the difference between data loaded with and without Dask - note the band of strong differences along the right-hand side of the image: image

Environment information

SpacemanPaul commented 1 year ago

Robbi also points out this seems to only occur with higher-order resampling methods like "bilinear" and "cubic" - it does NOT occur with nearest-neighbour resampling.

robbibt commented 1 year ago

@Kirill888 A theory is that this is caused by datacube.load not supporting resampling of fractional Dask chunks. How does odc-geo handle this? Would a similar distortion be expected along fractional Dask chunks when resampling using odc-geo?

Kirill888 commented 1 year ago

A theory is that this is caused by datacube.load not supporting resampling of fractional Dask chunks. How does odc-geo handle this? Would a similar distortion be expected along fractional Dask chunks when resampling using odc-geo?

@robbibt

  1. datacube.load should not fail like that when edge chunks are smaller than non-edge chunks
  2. odc-geo does not do data loading, but you can try an equivalent load using odc-stac and see if that is any better

It almost looks like it's choosing to load from an overview rather than full-res raster. Wonder what happens when edge chunk is only slightly thinner than full width chunk.

Kirill888 commented 1 year ago

First place to check would be here:

https://github.com/opendatacube/datacube-core/blob/e58edce93f3e1352106f97c7d9df30ff7e58eea5/datacube/storage/_read.py#L125-L133

I would suspect that edge tiles get wrong idea about read scale on line 131.

robbibt commented 1 year ago

Here's a loop-through increasing chunk sizes, plotting the difference between a non-Dask load and a Dask load (both with "bilinear" resampling).

It looks like the distortion increases in severity as the fractional chunks get smaller. Interestingly, even the non-fractional chunk appears to be slightly distorted, with this distortion decreasing as it expands to fill the entire array. ezgif com-gif-maker (15)

No such effect for "nearest" resampling: ezgif com-gif-maker (16)

robbibt commented 1 year ago

And @Kirill888 can confirm that the same issue occurs with odc.stac.load as well as datacube.load:

import datacube
import pystac_client

import odc.stac
from odc.geo.geobox import GeoBox

# Connect to datacube and setup STAC client
dc = datacube.Datacube()
catalog = pystac_client.Client.open("https://explorer.sandbox.dea.ga.gov.au/stac")

# Set Geobox
gbox = GeoBox.from_bbox(
    (-1827140.0, -2819940.0, -1824000.0, -2817250.0), "epsg:3577", resolution=10
)

# Build a query with the set parameters
query = catalog.search(
    bbox=gbox.geographic_extent.boundingbox,
    collections=["ga_s2am_ard_3"],
    datetime="2020-01-05",
)
items = list(query.get_items())

# Load using ODC Stac, no Dask
ds_stac_nodask = odc.stac.load(
    items,
    bands=("nbart_red"),
    resampling="bilinear",
    groupby="solar_day",
    geobox=gbox,
)

# Load using ODC Stac, with Dask
ds_stac_dask = odc.stac.load(
    items,
    bands=("nbart_red"),
    resampling="bilinear",
    chunks={"x": 300, "y": 300},
    groupby="solar_day",
    geobox=gbox,
).compute()

(ds_stac_dask - ds_stac_nodask).nbart_red.plot(vmin=-50, vmax=50, cmap="RdBu")

image

SpacemanPaul commented 1 year ago

Running Robbi's test code with a debugger, I can't see anything unusual happening with the scale at line 131 as suggested by @Kirill888 - it is returning scale==1 for all chunks.

Furthermore, the chunks read from the source file for each destination dask chunk at least look sane on first glance.

MicrosoftTeams-image (15)

Kirill888 commented 1 year ago

What version of rasterio you have installed? Can you try with some other version and see if there are differences?

SpacemanPaul commented 1 year ago

Yeah, all signposts are starting to point to rasterio. I was just about to try exactly that.

SpacemanPaul commented 1 year ago

The bug exists with both rasterio 1.3.2 and 1.3.4.

core is currently pinned to >=1.3.2 because of the warping bug in 1.3.0 and 1.3.1. About to test with latest release (1.3.7).

Edit: Occurs with 1.3.7 as well.

Kirill888 commented 1 year ago

I feel like it's expected effect of bilinear sampling, try it with average and difference disappears, at least for me on rasterio==1.3.6.

To test this I recommend the following experiment:

  1. Use non-dask load
  2. Load full image
  3. Load right half of the full image

Then crop image from (2) to match (3) and look at the difference in pixel values there, compare across different sampling methods. I would expect differences only when half image is cropped on top/left when cropping right|bottom results should be more closely matched.

SpacemanPaul commented 1 year ago

Isn't bilinear just a weighted average of the 4 nearest neighbour source pixels (weighted by distance from the destination pixel)?

I'm struggling to see how that could produce such macro effects.

robbibt commented 1 year ago

Hopefully this is what you meant @Kirill888:

import datacube
from datacube.utils.cog import write_cog
import matplotlib.pyplot as plt

dc = datacube.Datacube()

fig, axes = plt.subplots(4, 3, figsize=(12, 10), sharex=True, sharey=True)
axes = axes.flatten()

for i, method in enumerate(
    [
        "nearest",
        "average",
        "bilinear",
        "cubic",
        "cubic_spline",
        "lanczos",
        "mode",
        "max",
        "min",
        "med",
        "q1",
        "q3",
    ]
):
    common_params = dict(
        product="ga_s2am_ard_3",
        measurements=["nbart_red", "nbart_green", "nbart_blue"],
        crs="EPSG:3577",
        time=("2020-01-05"),
        resolution=(-10, 10),
        output_crs="EPSG:3577",
        resampling={"*": method},
        group_by="solar_day",
    )

    # Full image
    xmin, xmax = -1827131, -1824005
    ymax, ymin = -2817258, -2819937

    # Load small area with no Dask
    ds_full = dc.load(
        x=(xmin, xmax),
        y=(ymax, ymin),
        **common_params,
    )

    # Right half of the full image
    xmin_mid = (xmin + xmax) / 2.0

    # Load small area with no Dask
    ds_righthalf = dc.load(
        x=(xmin_mid, xmax),
        y=(ymax, ymin),
        **common_params,
    )

    # Calculate difference; xarray will take care of clipping to shared extent
    (ds_full - ds_righthalf).nbart_red.squeeze("time").plot.imshow(
        ax=axes[i], robust=True, cmap="viridis", add_labels=False
    )
    axes[i].set_title(f"Resampling method: {method}")

    # Export
    write_cog(
        ds_full.to_array().squeeze("time"), f"ds_full_{method}.tif", overwrite=True
    )
    write_cog(
        ds_righthalf.to_array().squeeze("time"),
        f"ds_righthalf_{method}.tif",
        overwrite=True,
    )

fig.savefig(f"diff_method.jpg")

You're definitely right: there are differences between the different resampling methods when you compare subsets of data loaded like this:

diff_method

But I'm still not 100% sure if this explains the issue: this is an animation of the outputs for "bilinear" - the differences are so subtle it's almost impossible to see; mainly just some slight shuffling of pixels: ezgif com-gif-maker (18)

Compare that to the really obvious "bluriness" we see in the original example: ezgif com-gif-to-mp4

Kirill888 commented 1 year ago

I'm not sure what bilinear in gdal land means. There is certainly increase in std deviation of differences as you shrink the edge, anything less than 20px looks obviously blurred, while something like 100px are way harder to spot as different when looking at imagery alone (not taking diff).

It could something to do with projection change, they might be computing coordinate transform parameters based on "the current work region", and if that shrinks too much that computation can become somehow unstable or wrong. I'm seeing issues with bilinear,cubic,cubic_spline,lancsoz, I guess they all just sample pixel location followed by pixel-space sampling after that, while others average,med,q1,q3,nearest don't exhibit that behaviour.

SpacemanPaul commented 1 year ago

I've tried the following (with bilinear):

1) Robbi's original example, with dask spatial chunk size increased to 4096 so the whole thing is read in a single dask chunk 2) Robbi's original example, with the ROI reduced so we're only looking at the top right chunk 3) Robbi's original example, with the ROI reduced in the x direction only, so we're only looking at the two (slender) righthand chunks

I cannot see obvious blurring like in the full example with any of these.

robbibt commented 1 year ago

There is certainly increase in std deviation of differences as you shrink the edge, anything less than 20px looks obviously blurred, while something like 100px are way harder to spot as different when looking at imagery alone (not taking diff).

Ah... if I reduce the size of the right half image until it's tiny (e.g. to just right 5%; xmin_mid = xmin + (xmax - xmin) * 0.95), the blurriness appears (both images resampled with "bilinear"; only the extents are different): ezgif com-gif-maker (19)

robbibt commented 1 year ago

This is pretty fascinating - similar to @SpacemanPaul I'd always thought of bilinear resampling as being affected by a pixel's immediate neighbours only. The fact that it behaves differently on different sized data loads is really surprising/confusing...

SpacemanPaul commented 1 year ago

IMO this appears to be bug in either rasterio or GDAL.

Kirill888 commented 1 year ago

I'm certainly observing pixel difference as a function of strip with for problematic resampling methods

image

Kirill888 commented 1 year ago

Raised issue with rasterio: https://github.com/rasterio/rasterio/issues/2839

experiment notebook https://nbviewer.org/gist/Kirill888/c8ddf8ef373e408a9d17450585c86e4a

snowman2 commented 1 year ago

Thought this may be relevant: https://lists.osgeo.org/pipermail/gdal-dev/2023-May/057258.html

for the methods bilinear, cubic, cubicspline and lanczos who have nominal kernel radius of respective values 1, 2, 2 and 3 (and thus a diameter double that size), the radius is extended by a factor source_size / dest_size to take into account the downsampling ( Cf lines 1188-1195 in alg/gdalwarpkernel.cpp), and the weights of the resampling kernel are evaluated with a scale factor too (cf lines 3713-3715 / 3741-3743))

SpacemanPaul commented 1 year ago

Thought this may be relevant: https://lists.osgeo.org/pipermail/gdal-dev/2023-May/057258.html

The source and destination size are almost identical in this example though (both 10mx10m pixels).

SpacemanPaul commented 1 year ago

Thanks @Kirill888 - I really appreciate your assistance in figuring this one out.

Kirill888 commented 1 year ago

The source and destination size are almost identical in this example though (both 10mx10m pixels).

@SpacemanPaul it's about aspect ratio of the output image, by the looks of it.

Even (GDAL one) commented on rasterio issue, and looks like work around is to add XSCALE=1, YSCALE=1 options to rasterio.warp.reproject. I'm just not sure if it's safe to do so always or should those values be dynamically set instead.

Probably here:

https://github.com/opendatacube/datacube-core/blob/032265a4d18189678d4bc1c8846648a5bc1d9fcf/datacube/storage/_read.py#L190-L191

Or maybe here:

https://github.com/opendatacube/datacube-core/blob/032265a4d18189678d4bc1c8846648a5bc1d9fcf/datacube/utils/geometry/_warp.py#L154-L164

I confirmed with my notebook that this fixes blurriness issue.

SpacemanPaul commented 1 year ago

Thanks Kirill, I'll give that a go.