Open robbibt opened 5 years ago
This is due to a change to use decimating read followed by re-project starting from 1.7 release, feature was added to support reads from overview images.
Data loading happens roughly like this within datacube (this for one time slice, ignoring multi-source fusing for simplicity):
Problem is in step 2 when image contains no overviews (this is the case for netcdf files). Down sampling by rasterio
driver uses nearest
by default, but most reasonable approach when down sampling by a large amount is to use average
for numeric data and mode
for categorical data.
Right now this is not possible since Datacube <> IO Driver
interface does not include resampling mode parameter as part of read
interface. This is a deficiency of the interface that requires breaking change to be addressed.
Ideally user should be able to configure both (2) and (3) currently only final reprojection is configured with resampling mode.
average
for numeric and mode
for categorical/mask makes most sense as a default when shrink factor is >> 2
.Note that average
and mode
are the only two methods that operate on an entire input image, other modes like cubic|bilinear|lancsoz
only sample few pixels around the re-projected point, so you usually don't want those when shrink factor is large >>2
, yet they make perfect sense for scale change <2
. So it is important to decouple shrinking resampling configuration from final resampling configuration.
I just came across this problem again recently after getting some very unexpected results when re-sampling coarsely. Is this potentially an issue that could be within the scope of ODC 2.0? I still think the current actual behaviour is very un-intuitive for anyone familiar with resampling data in GDAL/Rasterio/common GIS software.
@robbibt at the moment there are no "clean" mechanisms to make any of this happen, 2.0 never starts and gets de-scoped, so... We don't want to start making "dirty workarounds", because 2.0 is "just about to start", but it never does and doesn't look like it ever will.
A hint of frustration there me thinks...
As ODC SC Chair I will be helping to herd the ODC community cats towards ODC 2.0 and have flagged this issue as belonging to the ODC 2.0 project Triage process. ODC 1.8 release needs to be finalised first and we'll also keep ODC 2.0 nice a tight updates wise but since this issue is in the core IO which is where ODC 2.0 is focussed its worth flagging it.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue is still a key blocker to using coarsely resampled data loaded from ODC; commenting to remove stale status
Expected behaviour
When using
dc.load()
to load high resolution data (e.g. 25m) at a low resolution (e.g. 5000m), I expect to be able to useresampling='average'
parameter to return a low resolution output where each pixel has the average value of all contributing pixels in the native high-res data.For example, I would like hi-res data that looks like this:
...to be resampled using
resampling='average'
to look like the image below. This is what is generated ondatacube
version 1.6.1:Actual behaviour
However, on
datacube
version 1.7, the aggregated low-res map that is returned is patchy, with a large amount of lost data. This result is non-intuitive for anyone expecting a raster resampled in a similar way to GDAL/Rasterio/common GIS software, and limits the usability of the resampled data.Steps to reproduce the behaviour
Environment information
datacube --version
1.7 on the NCI