opendatacube / odc-geo

GeoBox and geometry utilities extracted from datacube-core
https://odc-geo.readthedocs.io/en/latest/
Apache License 2.0
80 stars 12 forks source link

refactor: nodata handling #162 #163

Closed Kirill888 closed 3 months ago

Kirill888 commented 3 months ago

nodata types explained

SomeNodata + dtype + [fallback] -> Nodata + dtype -> FillValue
github-actions[bot] commented 3 months ago

🚀 Deployed on https://667240bc235f9b98b2a5c58c--odc-geo-docs.netlify.app

SpacemanPaul commented 3 months ago

Thanks for the swift action on this Kirill - I will review tomorrow.

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 98.88889% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.48%. Comparing base (6af5d0c) to head (e80f39a). Report is 33 commits behind head on develop.

Files Patch % Lines
odc/geo/_xr_interop.py 96.55% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #163 +/- ## =========================================== + Coverage 95.26% 95.48% +0.21% =========================================== Files 31 31 Lines 5323 5489 +166 =========================================== + Hits 5071 5241 +170 + Misses 252 248 -4 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

robbibt commented 3 months ago

Thanks heaps for this @Kirill888 - the test cases appear to match the functionality I was hoping for nicely, but I'll do a "user" test today and verify that TIFFs exported using the updated code work as intended (e.g. in ESRI etc).

Kirill888 commented 3 months ago

@robbibt thanks, BTW pleas use pip install odc-geo==0.4.7rc1 for your test (that release is not on conda, only pypi)

robbibt commented 3 months ago

I think this is working perfectly. For example, we have a dataset with an Xarray nodata: nan attribute:

import datacube
import odc.geo.xr
from datacube.utils.cog import write_cog

dc = datacube.Datacube()

query_params = dict(
    x=(142.13223, 142.65461),
    y=(-32.17591, -32.54618),
    time=("2022", "2022"),
)

ds = dc.load(product="ga_ls8cls9c_gm_cyear_3", measurements=["edev"], **query_params)
da = ds.edev.squeeze()

image

This gets written out with a GeoTIFF nodata flag nodata=nan using both datacube and odc-geo tooling:

da.odc.write_cog("nodata_nan_odcgeo.tif")
write_cog(da, "nodata_nan_datacube.tif")

image image

However, if we write out data without a Xarray nodata: nan attribute, datacube doesn't include a GeoTIFF nodata flag, but now odc-geo does!

del da.attrs["nodata"]

da.odc.write_cog("nodata_missing_odcgeo.tif")
write_cog(da, "nodata_missing_datacube.tif")

image image

I can though return a true missing nodata value like this:

da.odc.write_cog("nodata_truenone_odcgeo.tif", nodata=None)

image

robbibt commented 3 months ago

I do want to do a quick test of the COG overviews stuff, so will approve this as soon as I've finished that.

robbibt commented 3 months ago

I think some of the naming is confusing, but the approach is sound.

I also find "MaybeAutoNodata" a bit confusing - took me a while to work out what it meant.

Kirill888 commented 3 months ago

I think some of the naming is confusing, but the approach is sound.

I also find "MaybeAutoNodata" a bit confusing - took me a while to work out what it meant.

What about SomeNodata instead of MaybeAutoNodata, @SpacemanPaul @robbibt

SpacemanPaul commented 3 months ago

I think some of the naming is confusing, but the approach is sound.

I also find "MaybeAutoNodata" a bit confusing - took me a while to work out what it meant.

What about SomeNodata instead of MaybeAutoNodata, @SpacemanPaul @robbibt

Yes SomeNodata or even AnyNodata works for me.

Kirill888 commented 3 months ago

Updated,

robbibt commented 3 months ago

OK, overview generation is working much better for our DEA Intertidal example: even if nodata is not set, we still get a result that follows GDAL's expected functionality:

datacube-core with nodata=nan Xarray attribute (overviews generated correctly ✔️):

datacube-core with no nodata Xarray attribute (overviews generated incorrectly ❌):

odc-geo with nodata=nan Xarray attribute (overviews generated correctly ✔️):

odc-geo with no nodata Xarray attribute (overviews generated correctly ✔️):