Closed corviday closed 4 years ago
Thanks for taking a look at this!
I'm assuming that the values for domain and modeling_realm make sense to the scientists.
modeling_realm
is one the scientists gave, though I'm not sure if there's a controlled vocabulary for it I need to compare with.
domain
is nwna = North West North America
; I saw it on another hydrology dataset that covered the same area and decided it applied to this dataset too. If you have any better suggestions, I'd be happy to hear them!
I'm curious about model_id; what does base signify? Why isn't this value null or the empty string? Ditto run. model_id and run suggest that this is model output, yet product == gridded observations, which suggests these are observations. This is on the face of it a bit confusing. Do model_id and run describe a gridding procedure that is applied to station observations (or something like that)?
Model_id
and run
are supplied because they are required by modelmeta, but this dataset is gridded observations and does not actually have a model or run. We've used these values before for some RVIC data that took gridded observations as an input, as mentioned under the Existing Similar Cases header here.
I agree it's kind of a mess. Any suggestions?
Hmm. In that document you linked to (I'd forgotten I'd written it!), there is analysis of how to progress towards something more or less consistent with our current metadata schema for model outputs, but modified for gridded observations. It looks from both the document and the metadata you give above that we went ahead with Alternative A, project_id = 'other'
, product = 'gridded observations'
, and the follow on decisions and consequences documented in Analysis, Alternative A. (And if I remember rightly, there was a PR on nchelpers
that accommodated this.) So far, so good.
What I see in the present PR that is inconsistent with the documented suggestion/decision is that model_id
doesn't have a very helpful value, or this value needs to be documented somewhere. I'd be interested to know what base
signifies to the scientists and why they chose it. The document suggests using the name of (presumably) a gridding program/procedure, e.g., TPS_NWNA_v1
(which I'd guess means "Thin Plate Spline, Northwest North America, ver. 1"). If they can use a program name like that, it would be more helpful in the long run I think.
As to the attribute run
, I'm not sure what to think. It's not discussed in the document. I would guess that its purpose might be to point to a specific configuration of the gridding program (assuming I am right about this being the context). In that case, it might help to have some adjunct attribute, human readable, that indicates where to find out what its value means. If it's just a placeholder, then I'd omit it altogether unless it makes nchelpers or the indexer burp ... which it might. As a value run1
is pretty innocuous, but it's also pretty empty without context.
Does that help any?
Finally, potentially feeding into this, there is a standard (perhaps still being refined) called Obs4Mips, which likely addresses at least some of these questions. The "Obs" part of its name is "observations", as opposed to model outputs. Lots of people have the same issues to sort out. I was looking at Obs4Mips not long after drafting that document, but various interruptions happened and I didn't get very far with it.
I asked Markus, and he indicated that there actually is a model (whatever that means in this case) for the elevation data, so I will use that.
Should I make "run" something like "na" ?
Yeah, there are models and models.
run = 'na'
seems like a good idea to me.
Great, thanks! I think we've worked out everything I need to get these into the system.
This isn't ready to merge yet, but I'd like @rod-glover to check over the metadata at this point. This YAML file is run by update_metadata to adjust the metadata for a dataset that contains elevation data for the pacific northwest. Some of the attribute values were derived from the PCIC standards, but some were derived from more speculative documents, or previous datasets.
It is treated as gridded observations with no time axis.
After running the update script, the metadata for the file is the following: