Open Xunius opened 4 years ago
@sadielbartholomew I read about the standard_name
in the links you gave and now I understand better about the differences between long_name
and standard_name
, I used to use the same string for both.
Regarding Point 2, the standard_names for the u- and v- flux components are indeed, as you said, eastward_atmosphere_water_transport_across_unit_distance
and northward_atmosphere_water_transport_across_unit_distance
.
Regarding Point 1, the uflux_s_6_1984_Jan.nc
data in the notebooks folder are directly obtained from the ERA-I reanalysis data center, I selected the desired variable, time step and domain etc.. and downloaded them. So they came without a standard_name
attribute, maybe it is the grib to nc conversion in their data server that omits the attribute. I never noticed this before.
If I understand correctly, there exists a pre-defined, permissible list of standard_name
s, so I can't just coin my new standard_name
, like numerical_label_for_atmospheric_river
or northward_atmosphere_water_transport_across_unit_distance_THR_anomaly_component
, can I? In that case, do I just leave the attribute empty?
@Xunius a couple of follow on points about this:
I don't think that you need to do anything to files downloaded directly from the ERA-1 reanalysis data center as its not your role to make that file CF compliant.
I would strongly recommend that you relax the requirements associated with dimension ordering and replace them with an expectation that specific dimensions are present (and named correctly). Netcdf and xarray both allow access to the name metadata. This will allow you to support both datasets that comply with the COARDS standard order and datasets such as the the ERA-I datacenter downloads which provide netcdfs in (time, lat, lon) order.
Regarding standard names: You are correct that you can't create a new standard name (e.g., something listed in the CF standard name table) on your own. My experience is coming up with these names is often challenging and a real art. My recommendation is that you use a descriptive name that follows the style of the standard names for the "standard_name" attribute field. Also take advantage of the "long name", "units", and "_FillValue" fields in order to complete the description. As your example above shows, being descriptive often means making very very long names.
Also π π to @sadielbartholomew for such a thorough comment on CF-compliance and standard names. π
This is copying from comments made by @sadielbartholomew. See original post at https://github.com/openjournals/joss-reviews/issues/2407#issuecomment-667736221.
Continuing on the topic of improvements that are not compulsory towards acceptance in the paper given the open criteria, but would be good to think about going forwards, for good practice with metadata I suggest making more use of the CF Conventions (the recommended standard for netCDF), namely as described in the three points below.
Increasing the compliance of the datasets included in the repository to the CF Conventions, especially those under the notebooks directory which users may interact with if they try out IPART with the provided Notebooks. Notably both
uflux_s_6_1984_Jan.nc
&vflux_s_6_1984_Jan.nc
provided there are marked by global attribute as being CF-compliant to CF 1.6:which is okay (relative to the ideal, latest version, 1.8), but immediately I see improvements in compliance that could be made.
For example, the variable & dimensions are all described by a
long_name
attribute, where use of astandard_name
attribute is preferable as each is unambiguous (see e.g. here). The time, lat & lon coordinates can take standard names of the same identifier as currently used for the long name, and from a quick search on the names table for "eastward" AND "vapor" I think the data itself withlong_name=Vertical integral of eastward water vapour flux
and unitskg m**-1 s**-1
could probably be assigned astandard name
ofeastward_atmosphere_water_vapor_transport_across_unit_distance
, or similar.you could explicitly state the standard names of data variables which would be applicable, e.g. something similar to
northward_
... andeastward_atmosphere_water_vapor_transport_across_unit_distance
and maybe link to the definition of all grids which may be considered rectangular, for clarity. This would make it crystal clear whether a user's dataset(s) may be appropriate for processing by IPART.So, you are advocating that users define data dimensions in the inverse order to that recommended. To make IPART more immediately accessible, you could amend your code so that it accepts the outlined conventional order, rather than the inverse.