opengeospatial / CRS-Gridded-Geodetic-data-eXchange-Format

Gridded Geodetic data eXchange Format
11 stars 3 forks source link

Variables more similar to NetCDF common usage #28

Closed desruisseaux closed 1 year ago

desruisseaux commented 2 years ago

Current state

The current prototype produces a file with the following structure (example derived from NTv2):

group: National\ Transformation\ v2_0 {
  dimensions:
    parameter = 4 ;

  // group attributes:
    :parameters.count = 4LL ;
    :parameters.0.parameterName = "latitudeOffset" ;
    :parameters.0.angleUnit = "arc-second" ;
    :parameters.0.unitSiRatio = 4.84813681109536e-06 ;
    :parameters.1.parameterName = "longitudeOffset" ;
    :parameters.1.angleUnit = "arc-second" ;
    :parameters.1.unitSiRatio = 4.84813681109536e-06 ;
    :parameters.2.parameterName = "latitudeOffsetUncertainty" ;
    :parameters.2.lengthUnit = "metre" ;
    :parameters.2.unitSiRatio = 1. ;
    :parameters.3.parameterName = "longitudeOffsetUncertainty" ;
    :parameters.3.lengthUnit = "metre" ;
    :parameters.3.unitSiRatio = 1. ;
    :uncertaintyMeasure = "2CEE" ;
    :interpolationMethod = "bilinear" ;

  group: CAeast {
    dimensions:
      gridi = 529 ;
      gridj = 241 ;
    variables:
      float data(gridj, gridi, parameter) ;

    // group attributes:
      :iNodeMaximum = 528LL ;
      :jNodeMaximum = 240LL ;
      :affineCoeffs = 60., 0., -0.0833333333333333, -88., 0.0833333333333333, 0. ;

Issue

Minor note: jNodeMaximum and jNodeMaximum are redundant with gridi and gridj dimensions. I think they could be omitted. Less redundancy is less risk of inconsistency.

My main concern is that above structure does not follow the usual netCDF practice. Even if we do not adopt fully the CF-conventions, I propose to nevertheless use them when they make sense. It would reduce users and developers effort by providing files that look more familiar to them. It also increases the chances that readers not aware of GGXF format can nevertheless understand those files at least partially. It would also enable the use of other CF-conventions such a packing mentioned in #23. Similar arguments were also expressed in https://github.com/opengeospatial/CRS-Gridded-Geodetic-data-eXchange-Format/issues/17#issuecomment-938541362.

Proposal

I propose to remove all parameters* attributes and replace them by the use of CF-Convention. For this issue I will take the following definitions (Roger or Chris, please correct me if I mix up those definitions):

For each group, the netCDF dimensions (not to be confused with CRS dimensions) of geodetic variables are as below (note: this is the same as current prototype except for the last dimension - parameter - which is replaced by axis):

In above NTv2 example the interpolation CRS is two-dimensional. So geodetic coordinates are (x,y), which is mapped by the affine inverse transform to grid coordinates (i,j). So we have two netCDF dimensions: gridi and gridj (I will use i and j bellow for brevity). If the interpolation CRS was three-dimensional, we would have three netCDF dimensions: i, j and k.

Then we add exactly one netCDF dimension which stands for target CRS dimensions. I will name that netCDF dimension "axis", but a longer description would be "index of target CRS axis". The length of this dimension is always the number of target CRS dimensions. So if the target CRS is two-dimensional, then we have exactly one axis dimension of length 2. If the target CRS is three-dimensional, then we have exactly one axis dimension of length 3.

Finally, we create one netCDF variable for each parameter with all target CRS axes grouped together in the same variable. In the NTv2 example we have:

Complete example rewritten with this proposal (an open question is whether target CRS axis should be a global dimension or specified on a group-by-group basis. Following proposal takes the later approach):

group: National\ Transformation\ v2_0 {

  // group attributes:
    :uncertaintyMeasure = "2CEE" ;
    :interpolationMethod = "bilinear" ;

  group: CAeast {
    dimensions:
      i = 529 ;
      j = 241 ;
      axis = 2 ;
    variables:
      float displacement(j, i, axis) ;
      float uncertainty(j, i, axis) ;

    // group attributes:
      :affineCoeffs = 60., 0., -0.0833333333333333, -88., 0.0833333333333333, 0. ;

If the target CRS uses (latitude, longitude) axis order, then:

If the displacement vectors were three-dimensional, then:

I think this approach would also be easier for implementer and less bug-prone, because it establishes a very clear and unambiguous relationship between indices and CRS axes. I'm not sure how this proposal would apply to other use cases than NTv2 datum shift, but my hope is that the core idea (which is to associate a netCDF dimension to the axes of some CRS) would still apply.

Bonus

CF-conventions can be applied when useful (we do not need to adopt all of them, but at least the useful parts). Below is an example using packed data (see #23). Note that it would not be possible with all parameters in a single data variable, because units, scale factor, etc. are not the same for all parameters.

    variables:
      short displacement(j, i, axis) ;
        translation::scale_factor = 0.0001;
        translation::add_offset = -0.005;
        translation::units = "arc-second"
      short uncertainty(j, i, axis) ;
        uncertainty::scale_factor = 0.001;
        uncertainty::add_offset = -0.05;
        uncertainty::units = "metre"
ccrook commented 2 years ago

Minor note: jNodeMaximum and jNodeMaximum are redundant with gridi and gridj dimensions. I think they could be omitted. Less redundancy is less risk of inconsistency.

To answer the simple bit immediately ... I agree totally on this. I had suggested changing to use the dimensions (offset by 1 from i/jNodeMaximum), but Roger was not keen. OTOH it may be sufficient for NetCDF implementation to consider that they are logically defined by the dimensions and not include them explicitly as attributes.

ccrook commented 2 years ago

Below is an example using packed data (see #23). Note that it would not be possible with all parameters in a single data variable, because units, scale factor, etc. are not the same for all parameters.

I also think packing would be worth looking at. I think that is independent of the use of units and distribution of parameters amongst variables, though obviously it will be most effective where the range of values of each parameter in the variable is similar. Uncertainty (and certainly significant uncertainty) will be likely to lie in the same range as the parameter whose uncertainty it reflects, unless there is a significant offset involved.

I've added the following comment to the README for the moment:

Consider the use of NetCDF packing for efficient storage of data (eg automatic testing of parameter range and implementing packing were suitable)

Note that adding the units and/or parameters to the variable is not particularly helpful as these are defined at the GGXF group level, not at the grid level where these NetCDF variables are held.

ccrook commented 2 years ago

... which gets to the main question - should we split the data, or provide an option for splitting the data, into more than one variable in the GGXF group. In terms of NetCDF conventions I suspect we are not compliant unless we make it one variable (grid) per parameter.

We are not following the conventions in any case (eg we are using an affine transformation for spatial definition rather than latitude and longitude value arrays). I think our approach is good in this regard. So in my mind this is not a question of our alignment or otherwise with the conventions, but whether there is an advantage to us in providing an option for splitting the grid into two or more NetCDF variables.

I think the arguments are:

Pro: efficiency, particularly in separating uncertainty, as many software will handle the deterministic parameters but not use the uncertainty data. In this case placing the data in separate variables will improve the efficiency as the required data would be more closely located in the NetCDF file and so more efficiently accessed.

Cons: It adds to the complexity of implementation, eg for each content type (possilby each GGXF grid) the assignment of variable to parameters must be defined. To provide a benefit GGXF libraries must support providing a subset of parameters.

Note: The benefits could be achieved (possibly even more effectively) by splitting the uncertainty into a separate GGXF group (ie supporting this explicitly at the GGXF level rather than implicitly at the NetCDF variable. This would provide an additional potential benefit that the uncertainty could use much simpler grids as it is likely to be much smoother, and has much lower accuracy requirements. This could allow for a further reduction in the total size of a GGXF file.

ccrook commented 2 years ago

Thinking about potential implementation I am wondering if the following would suffice:

Also both options for splitting the data could be supported (ie multiple variables in a GGXF grid header and splitting parameters across groups)

So the YAML group header could look something like:

    parameters:
      - parameterName: latitudeOffset
        parameterGroup: offset
        angleUnit: arc-second
        unitSiRatio: 4.84813681109536E-06
        precision: 0.00001
      - parameterName: longitudeOffset
        parameterGroup: offset
        angleUnit: arc-second
        unitSiRatio: 4.84813681109536E-06
        precision: 0.00001
      - parameterName: latitudeOffsetUncertainty
        parameterGroup: uncertainty
        lengthUnit: metre
        unitSiRatio: 1.0
        precision: 0.001
      - parameterName: longitudeOffsetUncertainty
        parameterGroup: uncertainty
        lengthUnit: metre
        unitSiRatio: 1.0
        precision: 0.001

The parameterGroup attribute would be used to define the NetCDF variable name holding the parameter and would have default value "data" reflecting the current name and the YAML variable name. The YAML file would still have the a single grid with the parameters in the specified order.

Here precision would optionally be used to support packing of data in NetCDF. It would take a default value from the NetCDF file creation options (ie command line parameter) or if none is specified then no NetCDF packing would be attempted.

desruisseaux commented 2 years ago

One major goal of this proposal is to establish an unambiguous relationship between CRS axes and parameters. If a parameter name is latitudeOffset, is it for the first or the second axis of the CRS? In current state it requires either fragile heuristic such as parsing the "latitude" part of the name and try to match that to the name of a CRS axis, or assuming that parameters are listed in the same order than axes. But it still difficult: is latitudeOffsetUncertainty the beginning of a new set of parameters to associate to axes, or some standalone parameter?

I think that we need to establish a formal relationship between parameters such as "latitude offset" and CRS axes. The most straightforward and unambiguous mechanism I could imagine in a netCDF file would be to assign a netCDF dimension to axis index. So every parameters for which there is one value per CRS axis should have an axis dimension. In a two-dimensional grid, we would have a 3-dimensional netCDF variable such as float offset(j, i, axis). In a three-dimensional grid, we would have a 4-dimensional netCDF variable, etc. The axis dimension would always be last in the list of netCDF dimensions.

Rewriting above YAML with this idea in mind, we could have something like below. The axisName are for informative purpose since they duplicate ISO 19111 axis name (maybe we should omit them). So the first parameter is offset, the first axis block inside offset describes the offset(j,i,0) values, the second axis block inside offset describes the offset(j,i,1) values, etc. if there is more CRS dimensions.

parameters:
    - parameterName: offset
        - axisName: latitude
          angleUnit: arc-second
          unitSiRatio: 4.84813681109536E-06
          precision: 0.00001
        - axisName: longitude
          angleUnit: arc-second
          unitSiRatio: 4.84813681109536E-06
          precision: 0.00001
    - parameterName: offsetUncertainty
        - axisName: latitude
          lengthUnit: metre
          unitSiRatio: 1.0
          precision: 0.001
        - axisName: longitude
          lengthUnit: metre
          unitSiRatio: 1.0
          precision: 0.001
ccrook commented 2 years ago

That certainly could work. This would conflict with #30 that I created yesterday, at least if we consider the "Note 3" which suggests that the parameter definitions could be moved up to file header level, with the group header just saying which of the parameters were used (to enforce consistent definition in files with multiple groups - so once again mainly deformation). However that hasn't received any comment as yet. So ignoring that extrapolation of the original idea in #30 this would be:

parameters:
    - parameterName: offset
      axes:
        - axisName: latitude
          angleUnit: arc-second
          unitSiRatio: 4.84813681109536E-06
          precision: 0.00001
        - axisName: longitude
          angleUnit: arc-second
          unitSiRatio: 4.84813681109536E-06
          precision: 0.00001
    - parameterName: offsetUncertainty
      axes:
        - axisName: latitude
          lengthUnit: metre
          unitSiRatio: 1.0
          precision: 0.001
          uncertaintyMeasure: 2EP
        - axisName: longitude
          lengthUnit: metre
          unitSiRatio: 1.0
          precision: 0.001
          uncertaintyMeasure: 2EP

I added "axes" to make the structure correct. In this case uncertaintyMeasure could be raised to the level of the parameter, rather than axes. But that wouldn't always work (eg latitude, longitude, height axes).

This does presume that the GGXF is only used for corrections to coordinate axes directly. At the moment all our example are that. But Kevin was considering that there may be data sets which were structured differently, in which the gridded data were coefficients for formulae to calculate corrections. This could perhaps be handled by using some more generic term than "axisName".

desruisseaux commented 2 years ago

Thanks for the correction. Maybe axisName could be omitted completely since it is redundant with ISO 19111 axis name?

On this issue:

This does presume that the GGXF is only used for corrections to coordinate axes directly.

Maybe we do not need this restriction. My proposal is that if a GGXF parameter is not used for corrections to coordinate axes directly, then the corresponding netCDF variable does not have an axis dimension. That would work at least for scalar (not sure for more complex data).

ccrook commented 2 years ago

@RogerLott - do you have any thoughts on the minor note at the top

Minor note: jNodeMaximum and jNodeMaximum are redundant with gridi and gridj dimensions. I think they could be omitted. Less redundancy is less risk of inconsistency.

I was suggesting in my comment above that https://github.com/opengeospatial/CRS-Gridded-Geodetic-data-eXchange-Format/issues/28#issuecomment-1010556163 the use of NetCDF dimensions offset by 1 from iNodeMaximum, jNodeMaximum would still hold the content defined in the specification, it is just that the implementation would look different to a user.

I suspect that most users are more familiar with grid dimensions as the number of rows/columns rather than the maximum index values so that may be more intuitive.

ccrook commented 2 years ago

@desruisseaux

Maybe we do not need this restriction. My proposal is that if a GGXF parameter is not used for corrections to coordinate axes directly, then the corresponding netCDF variable does not have an axis dimension. That would work at least for scalar (not sure for more complex data).

I imagine such coefficients would not be scalar. But hard to know without having an explicit example.

@kevinmkelly are you able to provide an example of the "grid of coefficients used in a formula" that you described in the meeting?

I can imagine a simplistic example would be an offset defined by distance and bearing but hopefully Kevin can provide a realistic example.

kevinmkelly commented 2 years ago

I was mistaken. NGS has gridded models for postseismic displacements of the 2002 M7.9 Denali Fault earthquake; they are described in this paper:

2013.SnayFreymueller.JAppGeod.pdf

I thought the grid contained coefficients for input to an equation, but it actually contains nominal N,E,U displacements in meters that must be scaled by a time function. See Sections 3 and Section 6 of the paper. So, as yet, I have not come across any grids that contain only coefficients.

ccrook commented 2 years ago

Good news - sounds like a perfect fit for the deformation model. Thanks Kevin

ccrook commented 2 years ago

See also https://github.com/opengeospatial/CRS-Gridded-Geodetic-data-eXchange-Format/wiki/DataStructure

RogerLott commented 2 years ago

Minor note: jNodeMaximum and jNodeMaximum are redundant with gridi and gridj dimensions. I think they could be omitted. > Less redundancy is less risk of inconsistency.

  1. I feel that it is desireable that the GGXF header, not only in NetCDF binary but also the YAML text, has the information to compute the grid extent. Currently done through applying NodeMaxima to the affine coefficients. We could align the YAML header with NetCDF by retaining the NodeMaxima attributes but renaming them gridi, gridj, [gridk]. This would be my preferred approach.

  2. @ccrook said:

    I was suggesting in my comment above that #28 (comment) the use of NetCDF dimensions offset by 1 from iNodeMaximum, >jNodeMaximum would still hold the content defined in the specification, it is just that the implementation would look different >to a user. I suspect that most users are more familiar with grid dimensions as the number of rows/columns rather than the >maximum index values so that may be more intuitive.

@desruisseaux originally suggested a zero-based counter. For a north-orientated grid (with no rotations) this has the great simplicity of being able to trivially compute the grid extent by multiplying the appropriate affine coefficient by the NodeMaximum. This simplicity is very appealing. If aligning the YAML header with NetCDF is now part of the agenda, it would make sense to change this to the less simple (but not difficult) NetCDF 'standard' of multiplying the appropriate affine coefficient by the (NodeMaximum-1) which is same as multiplying the appropriate affine coefficient by gridi [j,k].

I suggest we do either both or neither of 1 and 2, but not just one of them.

  1. This raises a somewhat different issue. Our strategy so far has been to draft the GGXF requirements based on and to meet geodetic needs, independent of any implementation. Aligning the requirements with NetCDF makes sense if we are agreed that that is the only or primary carrier. Proof of concept work to date suggests this is a possible way forward. Are we agreed on it?

(Might want to pull this into separate issues).

ccrook commented 2 years ago

@RogerLott on the iNodeMaximum ... issue.

Given that NetCDF is a likely carrier, and the it must include the +1 values (ie nGridi, nGridj, nGridk or whatever we call them), I suggest that for GGXF spec we just use these values. If we use iNodeMaximum in NetCDF this means it must have both and will cause potential confusion.

Whilst we don't want GGXF specification to be driven by the implementation in this case it makes very little practical difference, certainly none in terms of the capability of GGXF.

Using the number of rows/columns is very common practice in any case - more common than the node maxima. While the formulae are slightly more complex there are probably no CPUs for which decrementing an integer by 1 is not a single instruction (though not the same once it is in python or such languages) so it really is not an efficiency consideration - it is just aesthetics/preference.

From all this you will gather that my vote is to use the +1 values.

I am completely ambivalent about how we label these values, so gridi, nGridi, etc I haven't a preference on. But (as in your example) definitely i,j rather than column,row so we can extend easily to a third axis.

RogerLott commented 2 years ago

@ccrook on the iNodeMaximum issue. I don't think it necessarily a requirement that GGXF yaml changes from using NodeMaxima and base 0 for describing the grid extent, this to eliminate a redundant attribute, just because NetCDF expects gridi/j/k and base 1. There could be a rule that says for implementation in NetCDF map iNodeMaximum=n to gridi=n+1; the attribute iNodeMaximum is not redundant, it is not used in the NetCDF implementation but replaced by gridi. But having consistency between the yaml and NetCDF arguably will reduce the risk of confusion and conversion errors. But this is trivial and not addressing the more important @desruisseaux suggestion for grouping.

ccrook commented 2 years ago

@RogerLott @desruisseaux I have added to the wiki page a third option combining the discussion of uncertainty in issue #30 with Martin's ideas on distributing parameters across several GGXF variables (but not retaining any alignment with CF conventions). I have also added a section questioning the impact on the GGXF specification - basically questioning how much is specified vs at producers discretion with the layout 2 and 3 options.

I suggest it it more useful is to continue this discussion here rather than editing the wiki as we can maintain a conversation thread in this issue and all of the team can contribute, which is not the case for the wiki page.

ccrook commented 2 years ago

@RogerLott

But this is trivial and not addressing the more important @desruisseaux suggestion for grouping.

Definitely much less weighty - hence the thought we could maybe find agreement and close the discussion!

And I agree we could use the +1 values in NetCDF and the maxima in YAML. Essentially we are saying the specification is a "logical" specification and that +1 in NetCDF is an implementation detail. The NetCDF and YAML reader/writer will hide this mapping. Just means someone looking at the CDL will see a slightly different picture to someone looking at the YAML.

But I am still persuaded by your final sentence:

But having consistency between the yaml and NetCDF arguably will reduce the risk of confusion and conversion errors.

ccrook commented 2 years ago

Note in @desruisseaux in your layout 2 in the wiki you are hoping for

Unambiguous relationship between parameter values and CRS axes

However this won't always work for deformation models. For example the NZGD2000 deformation model defines a 3d time dependent displacement. However many of the elements/groups are only 2d (horizontal displacement) or 1d (vertical displacement). This happens where for geophysical reasons or more datum management reasons the horizontal and vertical movements are not handled in the same way. So assuming that there is a direct mapping from offset parameters to CRS axes would not work.

RogerLott commented 2 years ago

I have no problem with the general principle of aligning a GGXF NetCDF implementation with common existing NetCDF practices. But do we know how representative the CF-Convention use of NetCDF is?

But can there be any general 'rules' or guidelines for the formation of groups in a netCDF implementation that can be used regardless of content type?

I worry a little that in using a GGXF NTv2 example we are assuming that the parameters that may be in a GGXF file are similar to those that may be encountered in a CF implementation. The wiki https://github.com/opengeospatial/CRS-Gridded-Geodetic-data-eXchange-Format/wiki/DataStructure assumes four parameters, two of which are used directly in a transformation (offsets) and two of which are generally not (uncertainties). But a GGXF file where content = geographic2dOffsets could contain any of these parameter sets (plus others):

δφ, δλ δφ, δλ, u(δφ), u(δλ) δφ, u(δφ), δλ, u(δλ) φ, λ, δφ, δλ, u(δφ), u(δλ) φ, λ, δφ, u(δφ), δλ, u(δλ)

and the Interpolation CRS axis order may be φλ or λφ. Note that although the GGXF spec recommending that coordinates of the grid nodes (φλ or λφ) are not given as parameters, the reality today is that many geodetic grids do contain these parameters. For these permutations, what would be the recommended netCDF groups? Offsets and non-offsets, or coordinates, offsets and uncertainties?

If we look at a GGXF file where content = geoidModel, we frequently see the following parameter sets: ζ ζ, u(ζ) φ, λ, ζ, u(ζ) where the Interpolation CRS will be geographic 2D. So will the netCDF groups be offsets and non-offsets or coordinates, offsets and uncertainties?

Note that currently we see that there are some geoid model grids where the Interpolation CRS is projected CRS, so it should not be assumed that there will always be latitude and longitude axes.

Then what happens when none of the parameters are used in a transformation, for example when the GGXF content type = deviationsOfTheVertical. Now some possible parameter sets include: ξ, η φ, λ,, ξ, η ξ, η, u(ξ), u(η) ξ, u(ξ), η, u(η) φ, λ, ξ, η, u(ξ), u(η) What would be the netCDF groups? 'offsets' is irrelevant here. So deviations and uncertainty?

Are there to be any general 'rules' or guidelines for the formation of groups in a netCDF implementation that can apply regardless of content type and actual parameters? Do the GGXF conventions and in particular table C.5 (grid node parameter identifiers) need to be augmented with required group names?

ccrook commented 2 years ago

@RogerLott I think Martin's proposal doesn't affect the GGXF group structure of the files (if that is what you mean by groups above). It is about how the parameters are organised in the NetCDF variables within the group. A NetCDF variable in this context is a matrix - the data of a grid.

Martin is suggesting that rather than have just one matrix holding the all the parameters, as I have implemented it, there could be multiple matrices. The CF convention would have one matrix per parameter. Martin is suggesting that there could be a matrix for the value parameters, and a matrix for the uncertainty parameters. I have one matrix for all parameters (following the original GGXF description and YAML examples).

This suggestion has a number of potential implications for the GGXF specification. I say potential because it depends on to what extent GGXF defines the NetCDF implementation.

Even with my implementation, which is close to the current specification, there are a number of implementation choices which are not defined in the specification. So we could regard allocating parameters to different NetCDF variables as another implementation detail.

However for GGXF to be useful it probably needs to either define explicitly the NetCDF implementation, or be accompanied by an implementation specification. The NetCDF implementation specification would include details such as how GGXF groups, grids, and attributes are held in the NetCDF structure, how parameters are organised in the matrices, etc.

One of the issues for GGXF is that the 99% use case is just one group with either a single grid or a nested grid structure. The deformation model adds the requirement for multiple groups, potentially different parameters in each group (eg vertical displacement, horizontal displacement), and so on.

However acknowledging that, I do very much like the option of defining the full set of parameters in the GGXF header, including units, stochastic characteristic, etc.

The GGXF group (to use current terminology!) would only need to specify which parameters were defined in the group's grids, and, if we adopt Martin's proposal, how the parameters are allocated to NetCDF variables in the grids. The default could be the same as the current specification - a single variable called "data" with all the parameters in the order specified in the GGXF header. For the default case the only difference from the current specification would be that the parameters attribute is moved from the group header to the GGXF header.

This has benefits of:

For an implementation that handles the deformation model this doesn't add much complexity, as it already already has to deal with different parameters in different GGXF groups. This does add an extra level of indirection and complexity to map parameters to NetCDF variables.

Even if we don't go for Martin's proposal for multiple variables, from a deformation point of view I still like moving the parameter definitions to the GGXF header to ensure a single common definition. The only thing it doesn't work for is default values for uncertainty parameters, as these may not be the same for each GGXF group.

ccrook commented 2 years ago

@desruisseaux Just in terms of aligning parameters with axes I see there was this comment from Even in #6 (on a slightly different topic - but relevant to this):

In EPSG most geoid models are exposed as a transformation between a Geographic 3D CRS and a Vertical CRS, so violating this constraint.

ccrook commented 2 years ago

@desruisseaux @RogerLott Great to have the updated examples

I note that the deformation model example only includes the first group from example E5 which has two displacement axes (unlike the CRS which have 3 axes). The second group in E5 has three displacement axes. In a full example such as the NZGD2000 model there may also be groups with only vertical deformation. These need to be handled in the amended layout.

From a producer's point of view (which is my main role) I definitely prefer explicitly stating which parameter is which. The CRS is almost secondary to the production of the grids. I absolutely know which of my grid parameters are east offsets and which are north offsets. I am permanently confused about which axis GIS software (my common view on CRSs) choose to put first - call it geographic dyslexia, or maybe having been around too long. So regardless of NetCDF implementation, my preference is to keep the current explicit parameter names for each parameter axis in the YAML format. I want to be confident that the grid I have built with east, north, up displacements is understood that way without any ambiguity or indirection in where that is defined. (Of course I am just one producer, so might be worth asking others).

As a developer, having the parameter axes aligned to the coordinate axes is appealing at first sight. However it doesn't remove the need to know which axis is latitude, which is longitude, and which is height. Applying an offset in metres is a different operation for each axis - if I get these confused I'll get the wrong answer. Also to write a comprehensive GGXF implementation I still have to handle the cases where the parameter axes are not aligned with the coordinate axes (eg deformation groups above). So this proposal makes the implementation more complex. I now have to deal with vector parameters (such as offset) as well as scalar parameters, and I have to deal with mapping of parameters to NetCDF variables.

Also the specification itself becomes more complex as it needs specify more alternatives. If my producer's preference for explicit axis parameters in the YAML format is accepted then that needs to be described as well as the vector parameters for the same grid in NetCDF. The specification needs to describe the case when the grid parameter axes are assumed from the CRS, and the case when they are not.

All this seems to me to be a high price to pay to lessen the risk of developers confusing which parameter applies to which coordinate axis. And it adds an equal risk that they assume the first axis is latitude when in fact it is longitude and use the wrong formulae to apply an offset in metres.

I would have expected that most implementations would have an GGXF API/library interface for use by software which would extract metadata from and from the GGXF, layered on top of which would be code for actually using that information to update coordinates. That is the library would be have API functions like (at a high level):

Also for deformation/velocity etc maybe something like

Note that the software is not getting the source and target CRS from the GGXF. The software already knows the source and target CRS. It has selected the GGXF in the context of doing a transformation between them. At most the source/target CRS in the GGXF would be used to validate its selection, or as part of the discovery metadata.

All these functions operate without knowing anything about the source or target coordinate axes. In particular the getParameters implementation is much more straightforward if it returns a simple list of scalar parameters. The API could also have a function such as:

which is used to define the parameters returned by the calculation functions. This way the downstream software would not need to be concerned with the actual order of parameters in the GGXF file - it would specify what it wanted and that is what it would get from the GGXF API.

A higher level API might layer on top of this explicit coordinate transformation functions such as:

But note that not every use case would be doing coordinate transformations. For example survey network adjustment software may just want to know the east, north, and up offsets from a deformation model, rather than transformed coordinates. The coordinate API would also be different if the model is used a a point motion model rather than as a coordinate transformation model.

RogerLott commented 2 years ago

@ccrook

Note that the software is not getting the source and target CRS from the GGXF. The software already knows the source and target CRS. It has selected the GGXF in the context of doing a transformation between them. At most the source/target CRS in the GGXF would be used to validate its selection, or as part of the discovery metadata.

Yes and no. Software has indeed selected the particular grid to transform from one CRS to another, or within one CRS. But the direction that the software user needs may or may not accord with source and target CRS direction that the grid represents. The software needs the GGXF source and target information to understand whether it applies the values in the grid as documented or in the reverse direction.

ccrook commented 2 years ago

@RogerLott

The software needs the GGXF source and target information to understand whether it applies the values in the grid as documented or in the reverse direction.

Not necessarily I think. Surely the software will know that it is transforming from A->B and it will find a transformation between A and B defined in a registry or whatever. That definition will include the direction. ie it will be a definition of the transformation from A to B using the grid in the "forward" sense. Alternatively the software may find a transformation in the registry from B to A, in which case it will know that it needs to apply the transformation from the GGXF in the reverse sense.

That is not to say that the source and target CRS definitions shouldn't be in the GGXF - they are critical metadata to validate the use of the GGXF. I am just saying that those definition are redundant from the point of view of software doing a transformation using the GGXF.

desruisseaux commented 2 years ago

In my interaction with peoples from Open Subsurface Data Universe (OSDU), those peoples where not always requesting a coordinate operation between a pair of CRS, but they were often fetching a coordinate operation directly from its EPSG code. I do not know how they choose the EPSG code, but I would not assume that a pair of CRS is always the starting point.

In the examples that I provided on the wiki, one open issue is that the axis dimension maps to CRS axis, but does not said if this is the source CRS, the target CRS or the interpolation CRS.

We could add an attribute saying on which CRS between source, target and interpolation CRS the axis dimension apply. Does a mapping could always exist to at least one of those 3 CRS?

As a developer, having the parameter axes aligned to the coordinate axes is appealing at first sight. However it doesn't remove the need to know which axis is latitude, which is longitude, and which is height.

That information is with the CRS. Assuming we specified which CRS we are talking about, there is no ambiguity. There is a level of indirection, but the parameter approach is also a level of indirection, only at a different place.

I want to be confident that the grid I have built with east, north, up displacements is understood that way without any ambiguity or indirection in where that is defined.

With the axis dimension approach, I think we can be very confident: just apply the values to the CRS axes in the same order. The developer does not even need to know if axes are (latitude, longitude) or (longitude, latitude): (s)he can just apply the geographic offsets (for example) to the axes in the same order, blindly. By contrast the approach using latitudeOffset or longitudeOffset parameter names force the developer to parse the names, recognize the "latitude" and "longitude" keywords in the name, and identify to which CRS axes they correspond. I think this is both tedious for the developer and bug prone.

There is very possibly cases that I did not understood for which my current proposal does not fit, but I would like to try to see if we can adjust the proposal for fixing those short-coming. The most important one that come to my mind for now is to specify on which CRS the axis dimension apply.

ccrook commented 2 years ago

@desruisseaux

I want to be confident that the grid I have built with east, north, up displacements is understood that way without any ambiguity or indirection in where that is defined.

With the axis dimension approach, I think we can be very confident: just apply the values to the CRS axes in the same order.

This comment is from the viewpoint of producers, not developers. When I am building deformation grids I am not dealing with CRS other than the interpolation CRS, I am dealing with data and models of east/north/up displacements. From these I build the grids and time functions of the deformation model. So when I am building the grids I know exactly which are east. north, up offsets. To reduce the risk of errors as a producer I want to be able to explicitly specify the grid parameters to be confident that they are being used as I intended. Otherwise I will be worried that I haven't matched the axis order with the target CRS.

Note that this assertion is in the context of YAML, not NetCDF.

With the axis dimension approach, I think we can be very confident: just apply the values to the CRS axes in the same order. The developer does not even need to know if axes are (latitude, longitude) or (longitude, latitude): (s)he can just apply the geographic offsets (for example) to the axes in the same order, blindly.

Exactly my point - the values cannot be applied blindly to the axes. Applying a linear offset in metres is a different operation for each coordinate axis. So the developer must take account of which axes are which. You could even argue that forcing them to explicitly map parameters to coordinate axes reduces the risk of error.

But even if the developer does get it the wrong way round (with either approach) they will very quickly see this in testing, which hopefully they will do. We can, and probably should, create a test suite (maybe in GIGS?) that can be used to validate any software using GGXF. That can be used to eliminate this risk for any software.

I didn't realized that the Deformation Model had other groups with different axes. We can try to map them as well.

Certainly we could create a mapping between parameter axes and the source and target axes for CRS for most of the parameters, maybe as part of the YAML to NetCDF mapping. This would mean creating a lookup table from parameter axes to coordinate axes. That is, instead of doing this in the software when the GGXF is used, we would do it in the translator when the NetCDF version is created. This does introduce a risk for anyone directly creating the NetCDF. And it does make the specification more complex.

This may not work so well for uncertainties in the future if we choose to represent more sophisticated covariance information. For example we could represent horizontal uncertainty as an error ellipse with minimum, maximum errors and orientation of maximum error axis.

desruisseaux commented 2 years ago

What about the following proposal? In YAML, parameter names must comply with the following syntax:

<parameter>_<axis>

Where <parameter> is for example offset, velocity, etc. and <axis> must be the name of an axis as declared in the WKT of the CRS. So for example if the source CRS is declared with:

(…snip…)
AXIS["latitude", NORTH],
AXIS["longitude", EAST]

Then the parameter names can be for example:

If the <axis> part does not match the name of an AXIS element in the WKT, then the "YAML to netCDF" translator shall raise an error.

However the issue of specifying which CRS we are talking about (source, target or interpolation) is still present.

RogerLott commented 2 years ago

Before we go too far with misunderstandings here, one thing needs clarifying: @desruisseaux said For this issue I will take the following definitions: Interpolation CRS: the CRS in which are expressed the grid coordinates (after affine transform). Target CRS: the CRS of coordinates on which to apply displacement vectors.

This is not correct.

The grid is constructed in the Interpolation CRS. The grid may support a transformation, but it does not necessarily do so. There is no transformation to be applied with say a file containing deviationsOfTheVertical - it is just gridded geodetic data. In that case there is no source or target CRS.

However in many use cases the data in the grid(s) will support a transformation. When this is the case, the transformation will be changing coordinates from the source CRS to the target CRS. The Interpolation CRS may be totally different to either of source and taret CRS, it may be related to the source (e.g. the 2D horizontal component of a 3D source CRS), or it may actually be the source CRS. All of these cases need to be provided for.

ccrook commented 2 years ago

Following 28 March 2022 meeting - my understanding.

Agreement on:

In NetCDF the grid data are held in one or more NetCDF variables with dimension (nrow,ncol,nparam) where nrow and ncol are the dimensions identifying the grid node, and nparam are the number of parameters at the node. For example a horizontal offset might have nparam=2 where the two parameters are offsetEast and offsetNorth.

Note: the current implementation has one variable which is always called "data" and holds all the parameters for the group

The main point under discussion is what metadata is used to identify which parameters are in which variable. The two options are:

  1. the parameters in each NetCDF variable (matrix) are defined explicitly in the group metadata. If it uses a variable "offset" for coordinate offsets then the ggxfGroup header will explicitly specify that the ggxfGroup contains a variable named "offset" and the order of parameters it contains, ie either (eastOffset,northOffset) or (northOffset,eastOffset). In this case the specification could either explicitly state the NetCDF variable names that must be used for each parameter, or it an implementation choice on if and how parameters are assigned to variables.
  2. MD proposal. For parameters that can be mapped directly to source and target CRS definitions then those parameters are placed in a single NetCDF variable and the order of the parameters is dictated by the CRS definitions. For example if the source and target CRS are 2d geographic coordinate systems then a ggxfGroup header would simply specify that it has a variable called offset. This would contain parameters offsetEast and offsetNorth in the order matching the CRS definition. For example if the CRS coordinate axes are (latitude,longitude) then the offset parameters would be (offsetNorth,offsetEast). The specification explictly defines what NetCDF variable names are used, and the CRS definitions define the order of parameters in them.

The second option has not yet described how parameters not matching the coordinate axes would be encoded.

Action: CC to provide examples to MD to work with for that example.

Each option offers potential advantages.

Option 1 advantages:

Option 2 advantages:

@desruisseaux I hope this describes your arguments, but I have probably missed or misrepresented some aspects. I'll edit this comment with any points you want to make - just describe what changes you want in a comment below. That way we'll have the arguments in one place (if that works for you). Thanks

ccrook commented 2 years ago

@RogerLott While I am thinking about this it occurs to me that there may be a flaw in putting source and target CRS into the YAML file. At best they are a recommendation. But, for example, the same GGXF of coordinate offsets (either 2d or 3d) could be used to map between two different 2d CRSs, or between two different 3d CRSs. The GGXF file could be a valid file to use in each case. So a single GGXF may support multiple different translation options - the source and target CRS are not necessarily unique. Even more for a point motion deformation model (for which source and target CRS are the same, just epoch is different). In this case it could apply to a large number of CRS as it is the same definition regardless of the CRS (though I guess different CRS may have a different view of North).

ccrook commented 2 years ago

@desruisseaux Here is an example deformation grid with three groups. I have included the YAML (which references external data, not included), the full NetCDF format GGXF file, and the headers only .cdl file.

In this GGXF there are three ggxfGroups. The first has parameters displacementEast, displacementNorth, the second has parameters displacementEast, displacementNorth, displacementUp, and the third has just parameter displacementUp.

As Roger noted in the meeting there may also be GGXF files which are not used for coordinate operations at all, for example holding a gravity model, or deflections of the vertical (the latter arguably could be used to transform to astronomical longitude, latitude).

nzgd2000_subset.zip

ccrook commented 2 years ago

@desruisseaux On reflection I am unconfident of your assertion that if the offset parameter ordered as the source/target CRS then software can simply add it without worrying about the order.

My reason is that I believe most software that works with coordinates will convert the coordinates to some internal representation. For example, it may be an array, or it may be a structure with elements longitude,latitude,height. If it is an array, the software may well enforce a consistent order for the axes regardless of the order in the external representation they are loaded from. This would especially be the case if it is doing coordinate operations, such as converting geographic coordinates to geocentric coordinates. So assuming the coordinates are in an array in the same order as defined in the source/target CRS is not reliable.

I think that in many, if not most, cases, by the time the software is using the GGXF to retrieve offset parameters for an operation it will no longer have a simple array of coordinates ordered as in the external representation defined by the source/target CRS.

It may be more useful to most software to have a guaranteed order of parameters regardless of source and target CRS. For example we could specify parameters such as "offset_en", "offset_enu", and "offset_u" as vector variables rather than just "offset" with an order and number of parameters dependent on the source or target CRS.

(Note also that defining vector parameters such as "offset" or "offset_enu" may be more complex in terms of defining units for parameters, as the E,N and U axes may not all use the same units).

@RogerLott A corollary to this is that the same applies to the interpolation CRS. It may be easier for software if the interpolation coordinate axes are always ordered east axis, north axis (plus any other axes in future higher dimension grids). This does not mean that the software has to use that order for its internal operation. But it does mean that software can be written to use a consistent way of providing interpolation coordinates to the GGXF API regardless of the interpolation CRS axis order.

RogerLott commented 2 years ago

@ccrook said ...the same GGXF of coordinate offsets (either 2d or 3d) could be used to map between two different 2d CRSs ... In the ISO 19111 data model, a transformation has exactly one source CRS and exactly one target CRS. That then leads to the geodetic registry documentation of separate, duplicate in terms of the geodetic parameter values, entries each describing a transformation between one source CRS and one target CRS. In GGXF this situation also is most cleanly handled by duplication, having separate GGXF files for each transformation, each containing only one source CRS and one target CRS. Following this in GGXF avoids complications introduced with a many to many relationship and retains consistency (and a 1:1 mapping) with registry documentations of transformations. GGXF should retain the current provision of a maximum of one source CRS and one target CRS. Their presence is conditional, depending upon the content. An Interpolation CRS is always required.

RogerLott commented 2 years ago

@ccrook said ...It may be easier for software if the interpolation coordinate axes are always ordered east axis, north axis (plus any other axes in future higher dimension grids)... At the human interface, i.e. when coordinates are presented to human beings, the source and target CRS definitions are more than recommendations - they are statements of correct practice. What happens within a computer system, away from human sight, does not have to follow this, and within the bowls of application software standardization is the norm, on an application propriatory basis. An exchange file such as GGXF can be argued to be for computer implementation and therefore have a standard order. (But the YAML version of GGXF may be read by humans!). When dealing with 3D Cartesian systems it is easiest for software to be using a right-handed system. We need GGXF to cope with either up or (e.g. for hydrid models) down, i.e. with both ENU and NED. Whilst handedness does not really make sense in a 2D context, do you standardise on EN or NE or whichever would be the case if the 2D system were extended into 3D?

desruisseaux commented 2 years ago

I have updated the wiki page with an explanatory section. I try to take a more mathematical point of view, using scalar versus vector discussion and talking about vector space.

The summary in above comment seems fine to me, thanks.

In the above 3 last comments about axis order, it is true that implementations often use a fixed axis order internally. PROJ for example performs all its map projections in (longitude, latitude) axis order if I remember correctly (Apache SIS too). But the steps are:

  1. Analyze axis order in source CRS and target CRS.
  2. Permute coordinates to the axis order expected internally by the operation method implementation.
  3. Do the operation method using the internally fixed axis order.
  4. Permute coordinates to the axis order expected by the target CRS.

To my knowledge, both Apache SIS (for sure) and PROJ (I believe) work that way. An operation method using GGXF data is simply inserted at step 3 above. It automatically inherits the axis order analysis done by the rest of the referencing library. It has two consequences:

For the interpolation CRS, I'm 100% sure that in Apache SIS case, forcing a fixed axis order instead of relying on interpolation CRS definition would only complexify things. The process for locating a grid cell is:

  1. Do the GGXF operation:
    • Convert from source CRS to interpolation CRS using the same map coordinate operation engine than the one used by the whole library, including its way to check for axis order in CRS definitions.
    • Apply the inverse affine transform for getting grid indices.

Interfering with that process with fixed axis order, or axis order that we must infer from labels, would only bring unnecessary complexity with higher risk of bugs.

desruisseaux commented 2 years ago

On the other comment about EPSG registry versus GGXF files, we may decide to force a 1:1 relationship but I think that this is not mandatory. For an ISO 19111 CoordinateOperation fetched from an EPSG database entry the process in Apache SIS will be:

  1. Permute axis order / apply unit conversion / add dimension with z = 0 (if the operation method allows that) for converting coordinates from CoordinateOperation source CRS to GGXF source CRS.
  2. If the operation method is implemented with a fixed internal axis order, permute again the coordinates to that order.
  3. Do the operation.
  4. If the operation method was implemented with a fixed internal axis order, permute the coordinates to the order specified by the GGXF target CRS.
  5. Permute axis order / apply unit conversion / drop z dimension (if the operation method allowed that) for converting coordinates from GGXF target CRS to CoordinateOperation target CRS.

Notes:

desruisseaux commented 2 years ago

Above arguments were technical, but there is also maybe a political one. We had an axis order issue for years, and I feel like it is being settle down only recently, with the message "use axis order defined by the CRS" starting to be understood. Fixing a potentially different axis order in GGXF format (like what the GeoTIFF format does with axis order that does not match EPSG definitions), or saying "please ignore CRS axis order and looks at some naming convention instead" would be a step backward. Especially coming from the OGC CRS working group, it may send a very confusing message to users: "are we suppose to follow CRS definition, yes or no?".

RogerLott commented 2 years ago

Good point @desruisseaux and I fully support GGXF honouring axis order defined in CRS.

ccrook commented 2 years ago

@RogerLott

Good point @desruisseaux and I fully support GGXF honouring axis order defined in CRS.

To clarify, is this with regard to interpolation CRS, or is it also with regard to the order of parameters in vector variables.

For the interpolation CRS I think Martin's comment above is unassailable.

desruisseaux commented 2 years ago

To clarify, is this with regard to interpolation CRS, or is it also with regard to the order of parameters in vector variables.

I would said in regards to all tuples in the space of a CRS. Both coordinate tuples (from ISO 19111) in interpolation CRS, or what I called offset tuple in source or target CRS. All tuples should have their values in the axis order of the CRS that describe their space.

We do not need to store everything as vectors in a netCDF file. We can chose scalar, vector (or even matrix, tensor…) on a case-by-case basis. I believe that for some operation methods, some parameters are vectors (e.g. geographic offsets). I propose to use vectors only for parameters that fit the mathematical concept of vector in some space (not necessarily Euclidean). When that space is a CRS, I think that the order of elements in the vector should be as defined by the CRS.

ccrook commented 2 years ago

@desruisseaux

I would said in regards to all tuples in the space of a CRS. Both coordinate tuples (from ISO 19111) in interpolation CRS, or what I called offset tuple in source or target CRS. All tuples should have their values in the axis order of the CRS that describe their space.

I was just trying to clarify what Roger was meaning! I have a clear understanding of your viewpoint.

We can chose scalar, vector (or even matrix, tensor…) on a case-by-case basis.

I think your mathematical description of the organisation of vector parameters in the wiki is excellent. Thanks for that. I hadn't considered the possibility of matrix parameters in the future. This could be a way of representing parameter covariance information in the future. Also (from a consistency point of view) I was (and still am) undecided for scalar parameters whether it would be simpler from an API point of view to store them as vector parameters with axis dimension 1 and avoid having to use different indexing for scalar parameters. In the current python implementation this is easily managed but that is loading the entire grid and using the numpy library indexing which allows reshaping matrices easily. A production implementation would require a bit more work to manage these alternatives I imagine.

I agree with your political point on interpolation CRS.

I believe that for some operation methods, some parameters are vectors (e.g. geographic offsets). I propose to use vectors only for parameters that fit the mathematical concept of vector in some space (not necessarily Euclidean). When that space is a CRS, I think that the order of elements in the vector should be as defined by the CRS

... and I guess you won't be surprised that I still am not comfortable with this requirement.

One interpretation of the parameters, in some cases, is that they are vectors in a CRS space. However another point of view, that applies to all GGXF, is that they are vectors in an arbitrary parameter space. In some cases they may be offsets aligned with a CRS, in some cases the parameter space may be aligned with CRS axes but the offsets have a non-linear relationship the coordinate values.

Also the GGXF are not just used for coordinate operations. For example, almost all the applications and visualisation tools I have for working with deformation models, use vectors of east, north, up. This software would not be looking at or using the source/target CRS, and doesn't have any capability for parsing WKT.

For pretty much all my applications it will make more work and be less useful to have the offset vectors ordered according to source and/or target CRS order.

In the political decision about CRS axis order I suspect that no-one actually wanted different orders for different CRS, just that it was impossible to agree on longitude/latitude vs latitude/longitude. Both were in use and it was too hard to change. However in the case of vector offsets, which are not coordinates, I don't think that is the case. If we are going to specify an order of the vector components then it would make more sense to me to specify a consistent order. Or, failing that, to explicitly specify which parameter is which, as the current implementation.

desruisseaux commented 2 years ago

Hello (sorry for the late reply)

About whether to allow scalar values or to require the use of vectors of length 1, I think that both approaches are equivalent on a usability point of view. When reading the data as a flat array, parameter(j, i) or parameter(j, i, k) with dim(k) = 1 have the same layout, so I think it should be easy for developers to view scalars as vectors of length 1, or conversely, at their convenience. Given this equivalence, I propose to let producers choose "scalar" or "vector" depending on what describe best their data.

On this issue:

in some cases the parameter space may be aligned with CRS axes but the offsets have a non-linear relationship the coordinate values.

Could we said that the CRS space is non-Euclidean? In which case the vectors would still vectors in the CRS space, but with the understanding that the formulas are different than the usual ones. I have not yet read the Geographic information — Features and Geometry — Part 2: Measure specification in details, but my impression is that non-linear relationship with coordinate values is well accepted at OGC.

On the topic of axis order, we have two audiences:

But our main audience is the first one, and for the second audience I think that some human intervention would be needed anyway for helping the library to interpret the data, isn't it?

Summary of current state of discussion

Below is my understanding of current situation for peoples who prefer to not read the full discussion. My proposal is:

Another alternative proposal is to avoid the vector elements order issue by avoiding the use of vectors, and instead decompose all vectors into their elements stored as separated scalar parameters. The pros and cons have been discussed on the wiki.

ccrook commented 2 years ago

Scalars, vectors or matrices should be used for their mathematical meaning rather than as storage tricks.

I disagree that these are tricks. This is about storage, not mathematics. There are lots of precedents for storing heterogeneous elements in an array. You can choose to think of an array as representing a vector in some space - sometimes that is appropriate, sometimes it is not.

ccrook commented 2 years ago

@desruisseaux

Order of elements in vector is an unresolved issue in this discussion:

Agreed that this is is the point of contention. Or perhaps better worded "definition of order of elements in vector is unresolved".

  • My proposal is CRS axis order, both for political reasons and (I believe) for convenience of geospatial libraries.
  • Alternative proposal is externally fixed order, for convenience of at least non-geospatial libraries (e.g. visualization tools).

On the second option, the proposal is not an externally fixed order. The proposal is an order explicitly and consistently defined in the GGXF file (YAML or NetCDF). It needn't be a fixed order, though that could be specified. The example in Layout 3 is one way this could be defined. This has the advantage of being simple to describe in the GGXF specification and consistent for every type of data set.

(There is one point in the discussion at which an externally fixed order for parameters was mooted, which was suggesting parameter names for offsets could be something like offset_enu etc. But for consistency across data sets and consistency with YAML my preference would be that each parameter value is named independently. The GGXF specification just needs to state for each GGXF data type which parameters are required or optional. The GGXF file itself defines where those parameters are stored.)

What I don't like is having the order for some GGXF data sets defined by the parameter definitions, and for other data sets defined by the CRS axis order (or in your latest proposal I think you are suggesting that it may be defined indirectly by looking up from an index array to the CRS axis definitions). This is particularly inconvenient for non-geospatial software, as WKT is really not that well known outside geospatial definitions, and WKT CRS definitions even less so. Even the structure of WKT is not particularly accessible to any standard libraries - if it were JSON then at least it would be readily parsed without having to invoke non-standard libraries.

kevinmkelly commented 2 years ago

At the last GGXF meeting I was tasked with compiling some information on producers of geodetic data using NetCDF. The only geodetic datasets that use NetCDF format that I was able to find was satellite altimetry (SA) data - and there are many producers of these. All use NetCDF format. The following lists what I was able to uncover:

The NOAA/STAR Laboratory for Satellite Altimetry (LSA) NOAA / NESDIS / STAR - Laboratory for Satellite Altimetry (LSA) Format Technical Notes Contacts

Open Altimeter Database (OpenADB) | Deutsches Geodätisches Forschungsinstitut Technische Universität München (TUM) OpenADB NetCDF Format Contact: Christian Schwatke christian.schwatke@tum.de

80333 München Arcisstr.21 Tel. +49 89 23031-1109 Fax +49 89 23031-1240

Data Unification and Altimeter Combination System (DUACS) DUCAS SA Products Product User Manual (NetCDF Format)

Copernicus Marine Service SA datasets

Sentinel-3 Geodetic product types of Sentinel-3 Sentinel-3 datasets Sentinel-3 Data products user guide Sentinel-3 NetCDF format

ccrook commented 2 years ago

Thanks @kevinmkelly. Looks like these are all very strongly aligned with CF conventions. Each with one scalar parameter per netcdf variable (grid/array), as well as other aspects of CF conventions (eg arrays of longitude, latitude values, though quickly scanning the docs doesn't confirm this in all cases).

kevinmkelly commented 2 years ago

@ccrook. Agree. I did not dig deep into their NetCDF data structure, but I don't doubt that these SA data files align closely with current CF conventions, with no fancy departures or specialized data structuring.

ccrook commented 2 years ago

Note: My summary of NetCDF discussion prior to 23 May meeting is in a Google document

ccrook commented 1 year ago

The NetCDF structure is now resolved. The most significant change to the original structure (single data variable) is the use of multiple variables named for the data they contain. This is controlled through the use of a parameterSet attribute of in each parameter definition in the header which determines the corresponding netCDF variable name. A NetCDF variable may hold multiple parameters, eg three components of displacement.