stuckyb / gcdl

6 stars 2 forks source link

SMAP-HydroBlocks #105

Open HeatherSavoy-USDA opened 1 year ago

HeatherSavoy-USDA commented 1 year ago

Requested in Mar 2 GeoCDL meeting

https://zenodo.org/record/5206725#.ZADe-ezMJUd

HeatherSavoy-USDA commented 1 year ago

There appears to be two versions: one at 1km resolution that we could store and access as a typical local dataset and then a 30m version that would be stored locally but I think we would rely on this approach to subset based on user requests?

HeatherSavoy-USDA commented 1 year ago

The 1 km version is implemented as of 17a2e4e545fb4e457ae88ca954069c5034b5a891. This commit also adds a new 'hours' input parameter to handle this first sub-daily dataset.

I have the finer resolution version downloaded to Ceres and need to modify a copy of this first version dataset implementation to use the post-processing described here.

HeatherSavoy-USDA commented 1 year ago

One thing I noticed during testing of this dataset is that not every time step and area of interest will have data. For example, here are results for a box roughly around NM for 6am across 5 consecutive days:

image

The result returned from GeoCDL is 5 GeoTiFFs, each 1.7 MB. Though the last file is just the 'no data' value. Is it worth doing anything to prevent returning empty GeoTIFFs?

(Also, I'm assuming the missing data are due to the SMAP satellite path - but please correct me if this actually seems like a bug!)

stuckyb commented 1 year ago

That is a good question! If we choose not to return empty TIFFs, might users interpret that as the GeoCDL not doing what it was asked (i.e., returning an incomplete dataset or not answering the query correctly)? It is somewhat wasteful to return useless files, but on the other hand, I'd expect those to compress very efficiently since they are a constant value.