hypertidy / ncmeta

Tidy NetCDF metadata
https://hypertidy.github.io/ncmeta/
11 stars 6 forks source link

Bounds are not dimensions in CF Metadata Conventions files #48

Open pvanlaake opened 1 year ago

pvanlaake commented 1 year ago

CF Metadata Conventions use NetCDF files with a sophisticated set of conventions to ease interpretation and analysis of the data. One such convention is to include a bounds attribute with each dimension when the data represents cells (rather than point observations on a regular grid) to indicate the boundaries of the cell along the dimension. These bounds are included in the file as 3-D arrays for lon and lat (including the time dimension for reasons unknown to me) and a 2-D array for time, with an additional first dimension called bnds. As per the CF documentation, "a boundary variable is considered to be part of a coordinate variable’s metadata" and it is thus not a dimension. This is made clear also by the fact that for "dimension" bnds coord_dim == FALSE. See below example using tidync:

> huss <- tidync(lf[1])
> huss

Data Source (1): huss_day_EC-Earth3-CC_historical_r1i1p1f1_gr_19910101-19960321_v20210113.nc ...

Grids (7) <dimension family> : <associated variables> 

[1]   D1,D2,D0 : lat_bnds    **ACTIVE GRID** ( 976384  values per variable)
[2]   D1,D3,D0 : lon_bnds
[3]   D3,D2,D0 : huss
[4]   D1,D0    : time_bnds
[5]   D0       : time
[6]   D2       : lat
[7]   D3       : lon

Dimensions 4 (3 active): 

  dim   name  length     min     max start count    dmin    dmax unlim coord_dim 
  <chr> <chr>  <dbl>   <dbl>   <dbl> <int> <int>   <dbl>   <dbl> <lgl> <lgl>     
1 D0    time    1907 51500.  53406.      1  1907 51500.  53406.  FALSE TRUE      
2 D1    bnds       2     1       2       1     2     1       2   FALSE FALSE     ## <<<<<<<
3 D2    lat      256   -89.5    89.5     1   256   -89.5    89.5 FALSE TRUE      

Inactive dimensions:

  dim   name  length   min   max unlim coord_dim 
  <chr> <chr>  <dbl> <dbl> <dbl> <lgl> <lgl>     
1 D3    lon      512     0  359. FALSE TRUE  

> huss$attribute |> filter(name == "bounds") |> unnest(value)
# A tibble: 3 × 4
     id name   variable value    
  <int> <chr>  <chr>    <chr>    
1     0 bounds time     time_bnds
2     0 bounds lat      lat_bnds 
3     0 bounds lon      lon_bnds 

Would it be a good idea to drop the bnds "dimension" and thus the associated grids? There should be some other mechanism to keep them on, however, such that their contents can be accessed.

mdsumner commented 1 year ago

huh, well there you go, I never really understood that - I thought there were cases where the corner coordinates are stored explicitly

I'll have to look at a few cases and get reprexes, here's a couple:

src <- "https://dapds00.nci.org.au/thredds/dodsC/ua6_4/CMIP5/derived/CMIP5/GCM/native/CSIRO-BOM/ACCESS1-3/rcp45/day/atmos/Amon/r1i1p1/latest/sfcWind/aggregates/sfcWind_Amon_ACCESS1-3_rcp45_r1i1p1_2015-2034-monMax-seasmax-clim_native.nc"
ncdf4::nc_open(src)
src <- "https://dapds00.nci.org.au/thredds/dodsC/rr6/oceanmaps_datasets/roms/eac/his_2023_03_09_84685.nc"
tidync(src)
pvanlaake commented 1 year ago

Where do you get all this freakish data?!? Must be too much Vegemite down under!

The surface wind file conforms to what I observed before but the false dimension is called nb2. The false grids are still easily identified:

> sfc$attribute |> filter(name == "bounds") |> unnest(value)
# A tibble: 2 × 4
     id name   variable value   
  <int> <chr>  <chr>    <chr>   
1     4 bounds lon      lon_bnds
2     4 bounds lat      lat_bnds

Otherwise it's a funny file with all these monthly, seasonal and annual variables.

The ocean file is truly scary. This seems to be a file for intermediate use by people who are intimately familiar with this presentation of data. The false dimensions, like xi_u and eta_u, seem to have some meaning, if you know the formulae that relate to the (u,v) components of wind fields, because they yield different results for s_rho and ocean_time on the hyper_array of variable u, but having no values that are revealed by dimension variable data or local or global attributes. Simply dropping the false dimensions is an obvious bad choice here: variables u and v would no longer show up.

So I guess my issue should for now be considered a nuisance that is a feature of various data sets rather than a loose end that needs a fix.

mdsumner commented 1 year ago

oh right yes sorry I just went for a hard core example because I didn't otherwise know how to find something quickly

it's ocean model output, and doesn't get much more complicated 😄

I'm confused about what a coord dim is, and how bounds can be expressed - I think I was just wrong about this

a coord dim is just one that has axis values in a var, right, I think we can fix this pretty easily but I need to warm up a bit 👌

pvanlaake commented 1 year ago

I always interpreted a coord_dim as a flag to indicate if a variable contains the values of some dimension (TRUE) or whether it is a true variable on a grid whose values represent some physical property (FALSE). Unless I am very mistaken, this is how package ncdf4 puts dimension values in the vals property of each dimension in its ncdf4 class.

mdsumner commented 1 year ago

what about this file? are these not bounds in that strict way? I see you've said as much about the wind file above now :)

if ncdump says it's a dimension then I'm unclear what one is supposed to do about it otherwise

dt_global_allsat_phy_l4_20200603_20201126.zip

ncdump -h dt_global_allsat_phy_l4_20200603_20201126.nc
netcdf dt_global_allsat_phy_l4_20200603_20201126 {
dimensions:
        time = 1 ;
        latitude = 720 ;
        longitude = 1440 ;
        nv = 2 ;
variables:
mdsumner commented 1 year ago

here's the rest of the output, for reference

netcdf dt_global_allsat_phy_l4_20200603_20201126 {
dimensions:
        time = 1 ;
        latitude = 720 ;
        longitude = 1440 ;
        nv = 2 ;
variables:
 int crs ;
                crs:comment = "This is a container variable that describes the grid_mapping used by the data in this file. This variable does not contain any data; only information about the geographic coordinate system." ;
                crs:grid_mapping_name = "latitude_longitude" ;
                crs:inverse_flattening = 298.257 ;
                crs:semi_major_axis = 6378136.3 ;
        float time(time) ;
                time:axis = "T" ;
                time:calendar = "gregorian" ;
                time:long_name = "Time" ;
                time:standard_name = "time" ;
                time:units = "days since 1950-01-01 00:00:00" ;
        float latitude(latitude) ;
                latitude:axis = "Y" ;
                latitude:bounds = "lat_bnds" ;
                latitude:long_name = "Latitude" ;
                latitude:standard_name = "latitude" ;
                latitude:units = "degrees_north" ;
                latitude:valid_max = 89.875 ;
                latitude:valid_min = -89.875 ;
        float lat_bnds(latitude, nv) ;
                lat_bnds:comment = "latitude values at the north and south bounds of each pixel." ;
                lat_bnds:units = "degrees_north" ;
        float longitude(longitude) ;
                longitude:axis = "X" ;
                longitude:bounds = "lon_bnds" ;
                longitude:long_name = "Longitude" ;
                longitude:standard_name = "longitude" ;
                longitude:units = "degrees_east" ;
                longitude:valid_max = 359.875 ;
                longitude:valid_min = 0.125 ;
        float lon_bnds(longitude, nv) ;
                lon_bnds:comment = "longitude values at the west and east bounds of each pixel." ;
                lon_bnds:units = "degrees_east" ;
        int nv(nv) ;
                nv:comment = "Vertex" ;
                nv:units = "1" ;
        int err(time, latitude, longitude) ;
                err:_FillValue = -2147483647 ;
                err:comment = "The formal mapping error represents a purely theoretical mapping error. It mainly traduces errors induced by the constellation sampling capability and consistency with the spatial/temporal scales considered, as described in Le Traon et al (1998) or Ducet et al (2000)" ;
                err:coordinates = "longitude latitude" ;
                err:grid_mapping = "crs" ;
                err:long_name = "Formal mapping error" ;
                err:scale_factor = 0.0001 ;
                err:units = "m" ;
        int adt(time, latitude, longitude) ;
                adt:_FillValue = -2147483647 ;
                adt:comment = "The absolute dynamic topography is the sea surface height above geoid; the adt is obtained as follows: adt=sla+mdt where mdt is the mean dynamic topography; see the product user manual for details" ;
                adt:coordinates = "longitude latitude" ;
                adt:grid_mapping = "crs" ;
                adt:long_name = "Absolute dynamic topography" ;
                adt:scale_factor = 0.0001 ;
                adt:standard_name = "sea_surface_height_above_geoid" ;
                adt:units = "m" ;
        int ugos(time, latitude, longitude) ;
                ugos:_FillValue = -2147483647 ;
                ugos:coordinates = "longitude latitude" ;
                ugos:grid_mapping = "crs" ;
                ugos:long_name = "Absolute geostrophic velocity: zonal component" ;
                ugos:scale_factor = 0.0001 ;
                ugos:standard_name = "surface_geostrophic_eastward_sea_water_velocity" ;
                ugos:units = "m/s" ;
        int vgos(time, latitude, longitude) ;
                vgos:_FillValue = -2147483647 ;
                vgos:coordinates = "longitude latitude" ;
                vgos:grid_mapping = "crs" ;
                vgos:long_name = "Absolute geostrophic velocity: meridian component" ;
                vgos:scale_factor = 0.0001 ;
                vgos:standard_name = "surface_geostrophic_northward_sea_water_velocity" ;
                vgos:units = "m/s" ;
        int sla(time, latitude, longitude) ;
                sla:_FillValue = -2147483647 ;
                sla:comment = "The sea level anomaly is the sea surface height above mean sea surface; it is referenced to the [1993, 2012] period; see the product user manual for details" ;
                sla:coordinates = "longitude latitude" ;
                sla:grid_mapping = "crs" ;
                sla:long_name = "Sea level anomaly" ;
                sla:scale_factor = 0.0001 ;
                sla:standard_name = "sea_surface_height_above_sea_level" ;
                sla:units = "m" ;
        int ugosa(time, latitude, longitude) ;
                ugosa:_FillValue = -2147483647 ;
                ugosa:comment = "The geostrophic velocity anomalies are referenced to the [1993, 2012] period" ;
                ugosa:coordinates = "longitude latitude" ;
                ugosa:grid_mapping = "crs" ;
                ugosa:long_name = "Geostrophic velocity anomalies: zonal component" ;
                ugosa:scale_factor = 0.0001 ;
                ugosa:standard_name = "surface_geostrophic_eastward_sea_water_velocity_assuming_sea_level_for_geoid" ;
                ugosa:units = "m/s" ;
        int vgosa(time, latitude, longitude) ;
                vgosa:_FillValue = -2147483647 ;
                vgosa:comment = "The geostrophic velocity anomalies are referenced to the [1993, 2012] period" ;
                vgosa:coordinates = "longitude latitude" ;
                vgosa:grid_mapping = "crs" ;
                vgosa:long_name = "Geostrophic velocity anomalies: meridian component" ;
                vgosa:scale_factor = 0.0001 ;
                vgosa:standard_name = "surface_geostrophic_northward_sea_water_velocity_assuming_sea_level_for_geoid" ;
                vgosa:units = "m/s" ;

// global attributes:
                :Conventions = "CF-1.6" ;
                :Metadata_Conventions = "Unidata Dataset Discovery v1.0" ;
                :cdm_data_type = "Grid" ;
                :comment = "Sea Surface Height measured by Altimetry and derived variables" ;
                :contact = "servicedesk.cmems@mercator-ocean.eu" ;
                :creator_email = "servicedesk.cmems@mercator-ocean.eu" ;
                :creator_name = "CMEMS - Sea Level Thematic Assembly Center" ;
                :creator_url = "http://marine.copernicus.eu" ;
                :date_created = "2020-12-07T20:44:09Z" ;
                :date_issued = "2020-12-07T20:44:09Z" ;
                :date_modified = "2020-12-07T20:44:09Z" ;
                :geospatial_lat_max = 89.875 ;
                :geospatial_lat_min = -89.875 ;
                :geospatial_lat_resolution = 0.25 ;
                :geospatial_lat_units = "degrees_north" ;
                :geospatial_lon_max = 359.875 ;
                :geospatial_lon_min = 0.125 ;
                :geospatial_lon_resolution = 0.25 ;
                :geospatial_lon_units = "degrees_east" ;
                :geospatial_vertical_max = 0. ;
                :geospatial_vertical_min = 0. ;
                :geospatial_vertical_positive = "down" ;
                :geospatial_vertical_resolution = "point" ;
                :geospatial_vertical_units = "m" ;
                :history = "2020-12-07 20:44:10Z: Creation" ;
                :institution = "CLS, CNES" ;
                :keywords = "Oceans > Ocean Topography > Sea Surface Height" ;
                :keywords_vocabulary = "NetCDF COARDS Climate and Forecast Standard Names" ;
                :license = "http://marine.copernicus.eu/web/27-service-commitments-and-licence.php" ;
                :platform = "Altika Drifting Phase, Cryosat-2, Haiyang-2A Geodetic Phase, Jason-3, Sentinel-3A, Sentinel-3B" ;
                :processing_level = "L4" ;
                :product_version = "vJul2020" ;
                :project = "COPERNICUS MARINE ENVIRONMENT MONITORING SERVICE (CMEMS)" ;
                :references = "http://marine.copernicus.eu" ;
                :software_version = "6.4_DUACS_DT2018_baseline" ;
                :source = "Altimetry measurements" ;
                :ssalto_duacs_comment = "The reference mission used for the altimeter inter-calibration processing is Topex/Poseidon between 1993-01-01 and 2002-04-23, Jason-1 between 2002-04-24 and 2008-10-18, OSTM/Jason-2 between 2008-10-19 and 2016-06-25, Jason-3 since 2016-06-25." ;
                :standard_name_vocabulary = "NetCDF Climate and Forecast (CF) Metadata Convention Standard Name Table v37" ;
                :summary = "SSALTO/DUACS Delayed-Time Level-4 sea surface height and derived variables measured by multi-satellite altimetry observations over Global Ocean." ;
                :time_coverage_duration = "P1D" ;
                :time_coverage_end = "2020-06-03T00:00:00Z" ;
                :time_coverage_resolution = "P1D" ;
                :time_coverage_start = "2020-06-03T00:00:00Z" ;
                :title = "DT merged all satellites Global Ocean Gridded SSALTO/DUACS Sea Surface Height L4 product and derived variables" ;
}