Closed dblodgett-usgs closed 5 years ago
This is a very good point. I've tended to shy away from attributes because they are complicated compared to the raw data. I definitely like the idea of returning tables that are normal form , so if we ask for nc_att(varidentifier)
the table should be nrow == n_atts
of that variable. I'm definitely not consistent on that in ncmeta. I appreciate any PRs in this direction, and I'll be very positive about contributions!.
To contextualize, ncdf4
and RNetCDF
are extremely different, the former returns all metadata in one connection object and you are expected to traverse the tree - it's not normal form, there are many redundancies - RNetCDF has more verbs to extract each part, but it's not as efficient because it doesn't manage the file connection as well. I found RNetCDF easier to build upon, but ncdf4 is faster generally.
I don't have a handle on attributes yet, but tidync has a very strong idea of grids and variables and dimensions, and they are automatable in powerful ways. The idea of ncmeta is to protect tidync (and friends) from these details, but as you clearly identify - the attributes in NetCDF are not clearly modelled here yet.
OK. Yeah, I agree re: normal form. I'll get a PR together for that and see what you think.
In general, I think I can help with the NetCDF attributes. Been working in the CF community for a long time and generally know my way around vagaries of the spec.
I'm seeing
[1] "attribute" "variable" "value"
where attribute is numeric instead of character unless you request by character in which case it comes back character.It would be pretty useful to get the attribute name rather than the index to work with standard
cf
attributes. Would that be useful to you too?Starting to look at the code I am puttering around in nc_att adding the name and see two ways this could be done now. Since you've already got the value column as a list column, we could make it a named list. i.e. now I see something like:
and it would better as?
Or we could modify the returned tibble so it has an "id" and a "name" column. The big issue now is that attributes can be requested by id or by name -- which is fine from a request point of view, but results in different kinds of output in the "attribute" column of the current output.
e.g. try this:
I started implementing a solution that does this:
At the end of the day, I guess it comes down to how one would want to use the output. If the intention is for that "attribute" column to be used as a key to the semantics of the requester, I'd say leave it the way it is. If it's mean to be an identifier for the attribute, it's ambiguous and it should be changed to be "id" and "name"?
I'd be happy to implement and PR a solution here depending what you think would be most useful and unintrusive. Thinking I can probably contribute a few things here that I'll use in some work I'm doing that also uses
stars
so getting on the same page re: your vision would be helpful.Cheers!