Open jhamman opened 4 years ago
This recipe requires us to resolve https://github.com/pangeo-forge/pangeo-forge/issues/50 and https://github.com/pangeo-forge/pangeo-forge/issues/39.
This recipe is now ready to be implemented.
I just began researching creating a gridMET recipe targeting the MS Planetary Computer bakery. Reviewing the Terraclimate example posed a few questions for me concerning pre-processing. Forgive my lack of experience in this area (I am decidedly not a scientist) but is there some documentation or literature which outlines the thresholds for invalid data used in the mask_opts
? Is there an equivalent reference which should be used for gridMET pre-processing? cc @rabernat and @cisaacstern. Cheers.
That recipe comes from @jhamman. I'll let him answer the specific question.
But the general answer would be the following: this sort of bespoke quality control is the very definition of domain-specific expertise! Only folks who know the data intimately can set those sorts of parameters. And this is precisely the point of Pangeo Forge: to engage those scientists and get them to share their expertise with the community to produce a common pool of ARCO data. You @sharkinsspatial as an engineer are not expected to know those details, just like the domain scientist is not expected to know the details of kubernetes. The recipes will hopefully be full of these kinds of details which will make the data we produce more useful at the end of the pipeline.
Make sense?
Is there an equivalent reference which should be used for gridMET pre-processing?
I did not answer this question in my previous comment... But basically you would need to get someone who really knows the data to answer it. Ideally data coming from data providers would be totally clean and can be copied as-is, without extensive preprocessing / cleaning steps. But that is often not the case. I can't speak to gridMET specifically.
Thanks @rabernat 👍 . I'll try to follow up with @jhamman and see if he can add any insights as well as attempting to reach out to the community of gridMET users and elicit some recommendations.
@jhamman It looks like you have done the majority of the early work on Terraclimate recipes https://github.com/pangeo-forge/terraclimate-feedstock-archive/blob/master/recipe/pipeline.py 🙇 . I haven't done deep investigation into the development methodology of gridMET and Terraclimate and just quickly noted the overlap in variable types.
If possible can you provide a bit of background on how you developed the pre and post processing techniques for Terraclimate cleaning and if you also have experience working with the gridMET data would you or @norlandrhagen have any interest in collaborating / assisting with recommending preprocessing approaches for that as we develop the recipe? Cheers.
Hey there @sharkinsspatial, definitely interested in working on the gridMET recipe. Can probably take a stab at it in the coming week.
@norlandrhagen I have a rough example of a gridMET recipe I am testing which handles some of the file pattern irregularities. I'll try to make a PR to staged-recipes
tomorrow so I can elicit some feedback. I still have some domain expert questions for @jhamman related to his masking functions from the Terraclimate example and if there are any related insights for the gridMET data.
gridMET is a dataset of 4km daily surface meteorological data covering the CONUS domain from 1979-yesterday.
Transformation / Alignment / Merging
Files should be concatenated along the time dimension and merged along the variable dimension
Output Dataset
1 Zarr store - chunks oriented for both time series and spatial analysis.