Closed ithaca-oyler closed 1 month ago
@ithaca-oyler please provide replication code, what stac query you are running, how are you calling stac load etc. This certainly looks like broken nodata marker, do the data files have expected nodata field set? By default we trust nodata marker in the file over all other places. Do you get same problem when running without Dask?
Hi @Kirill888 -- attached is a zip file that contains a sample notebook, the config file, and an example of what one of the metadata files look like. According to the metadata, the data files expect to have nodata set to -9999 (listed as FILLVALUE in the metadata). The output images shown in my previous comment are the outputs from odc.algo xr_quantile at the end of the notebook. I have tried several different combinations of setting the nodata value in the config file, as a stac.load() call, and in xr_quantile() with no good results.
@ithaca-oyler the code does way too much and requires complex environment to run. Please provide a simpler example using single band only. My guess is that data bands work probably fine, but Fmask band is not, need to specify nodata=255
for it. Please debug more on your end.
@Kirill888 Thank you for your patience while I put together a simpler example for you. I'm taking out the cloud masking step, but still making the change you suggested with specifying nodata=255
in the HLS config I was using. I hope to have an updated notebook made in the next day or so. In the meantime, I have done some additional debugging and I think you're on the right track with part of the issue being in the Fmask band. Fixing some of the HLS config helped and brought back some of the pieces of data that are between the HLS tiles.
Here's what specifying nodata=255
and removing the cloud masking step resulted in:
I also decided to experiment with the chunks
setting in stac.load()
and I was surprised to find changing up this setting produced different results. For the image above, the setting was chunks={"x": "auto", "y": "auto","time":"auto}
. Here, I changed it to chunks={"x": 512, "y": 512,"time":"auto"}
And here I changed it to chunks={"x": 1024, "y": 1024,"time":"auto"}
Do you have any thoughts as to why the output might be changing with the chunk sizes?
Hi @Kirill888 -- a few updates. I've done a bit more debugging and I was able to determine that there were several issues contributing to my results, namely cloud masking, the no data flags I had set in my HLS config file, and needing to select the correct groupby
argument in .load()
. Below is the result of nodata=0 in my config file and data = stac.load(item_collection, geopolygon=roi, chunks={"x": "auto", "y": "auto","time":"auto"}, output_crs=crs, resolution=30, stac_cfg=cfg, nodata=np.nan, dtype='float64', groupby='solar_day')
for the same subset of 3 days shown in previous comments (no cloud masking):
The fragmentation shown in my original question occurs with a combination of the nodata value in my HLS config not being set, the xr_quantile()
nodata value being set to 0, and groupby='time'
or groupby
not being set at all. Lots of moving parts and places where things could go wrong.
The end results output from xr_quantile()
still aren't quite looking how I expect, but I think I've ruled out .load()
as the source of my problems given how the rgb time slices look. I'll hop over to the odc-algo repo to post my question about xr_quantile() there.
Thanks a bunch!
I'm currently using HLS data (both HLSL30.v2.0 and HLSS30.v2.0 collections) for a short time frame (5-16-2024 through 5-18-2024) and I noticed that only areas that have an overlap of both collections still have imagery in the final output. I'm at a bit of a loss as to why this may be occurring, but I suspect it may have to do with the way the lazy xarray dataset is built. I thought setting the nodata value might help (as mentioned in https://github.com/opendatacube/odc-stac/issues/162), but the output did not change. The HLS config contains information about band names, band values, cloud masking information, etc.
I am currently using odc-geo==0.4.6 and odc-stac==0.3.9.
This image shows what we expect to see:
And this image shows what is loaded in with
data = stac.stac_load(item_collection, geopolygon=roi, chunks={"x": "auto", "y": "auto","time":"auto"}, output_crs=crs, dtype='int16', resolution=30, stac_cfg=hls_cfg )
I know there are a lot of moving parts in this issue, but any insights as to why something like this might be happening will be very helpful :) Thank you!