xarray-contrib / xdggs

Xarray extension for DGGS
https://xdggs.readthedocs.io
Apache License 2.0
79 stars 14 forks source link

hierarchical operations on cell ids: parents #62

Open keewis opened 4 months ago

keewis commented 4 months ago

Towards https://github.com/xarray-contrib/xdggs/issues/38#issuecomment-2007410538: this computes the parents at the given resolution for every cell id. This is similar to h3ronpy.pandas.change_resolution_paired (as mentioned in #18), with the restriction that it is only possible to compute the parents.

Note that this does not aim to perform any sort of aggregation / interpolation / reindexing. In other words, this is purely a cell id operation.

In theory, we could expand this to also return the children, but then size of the cells dim would change, and we'd get duplicated entries in cell_ids coordinate.

Tests are still missing, I'll add them once we settled on a design.

keewis commented 3 months ago

something else we could do is to add a new "child_cells" dimension when computing children, which could be collapsed if needed using stack (and passing the same resolution would be a no-op).

With that, though, we'd have to rename the method. H3 uses change_resolution, but I wonder if scale would work, as well?

strobpr commented 3 months ago

when it's about grids I would call it 'grid level' not 'resolution'. Cell size sets a limit to resolution but is no guarantee for it. If you resample 1km gridded data to a 10m grid you change the cell size (level) but not the resolution, which might be 1,5 or even 2km. Also 'scale' is something different, as it is usually a ratio and not an absolute cell size. Relation between cell width in a 'raster map' and the 'scale' of that map is somewhat loose and disputed.

keewis commented 3 months ago

If I understand correctly, you'd like to use distinct terms for the number of subdivisions of the base pixels and the true information content? Right now, with resolution we're referring to the former (which was inspired by H3), but changing to level may still be possible (I'd like to avoid composite names, though, so I would vote not to use grid_level). This should be discussed in a separate issue / the meetings, though, as changing this would require breaking changes to most parts of the library.

Also 'scale' is something different, as it is usually a ratio and not an absolute cell size.

Then what about scale_to, to imply that we work with absolute values?

strobpr commented 3 months ago

If I understand correctly, you'd like to use distinct terms for the number of subdivisions of the base pixels and the true information content?

'the number of subdivisions of the base pixel' is not really defined in DGGS terminology. We talk about the 'refinement ratio' which is the ratio of the number of child cells to parent cells. In a congruent (nested) DGGS converting values to a higher level (smaller cell size) does not at all affect the information content. You can first dis-aggregate and then re-aggregate without any loss (or gain). In other words you can change the level without changing the resolution. So yes, these two parameters are distinct.

Right now, with resolution we're referring to the former (which was inspired by H3), but changing to level may still be possible (I'd like to avoid composite names, though, so I would vote not to use grid_level). This should be discussed in a separate issue / the meetings, though, as changing this would require breaking changes to most parts of the library.

I'm sorry if this messes with your legacy, but I would really ask you to stick whenever possible with agreed (standard) terminology. Often enough this will be difficult as you might find inconsistent standards because different groups did not align their definitions. But at least for DGGS we have a pretty solid set of terms that could largely help to converge on a common 'gridding' language.

As for the variable name, maybe rlevel or reflevel or, if not too long, gridlevel could work? Keep in mind that there are lots of other 'levels' (processing levels, maturity levels, etc) floating around which could confuse users.

Also 'scale' is something different, as it is usually a ratio and not an absolute cell size.

Then what about scale_to, to imply that we work with absolute values?

For a method, change_resolution is certainly the worst choice, as at least for refinement this is not what is does. change_...level or regrid_to would be much more to the point!

keewis commented 2 months ago

sorry the late reply. I will raise this in the meeting on Monday (Sep 2, see #45 if you'd like to join), hopefully we can come up with a good way forward. In any case, this is an experimental package, so fortunately we don't really have to care about backwards compatibility and can just go ahead and make breaking changes if necessary.

As far as confusion with other types of levels go, do those appear in the context of grid parameters? If not, I think the context will be sufficient to disambiguate, so I'm not worried that this would be confusing.

benbovy commented 2 months ago

Regarding resolution vs. level, I agree that "level" would be better assuming that for all DGG systems supported here it represents one of the finite (discrete) hierarchical levels of the system via an integer value. "Resolution" is more generic and often consists of a measure in a continuous space.

In theory, we could expand this to also return the children, but then size of the cells dim would change, and we'd get duplicated entries in cell_ids coordinate.

I wouldn't mind using the same method for upgrading or downgrading the cell level, even if the dimension size would change in the case of getting ids at a finer level. It might also make sense to keep the dimension size unchanged in the latter case too, e.g., using the cell centroid. I agree that a change_level method would be very much to the point.

keewis commented 2 months ago

It might also make sense to keep the dimension size unchanged in the latter case too, e.g., using the cell centroid.

as discussed in the meeting, using the cell centroid / the child cell that is at the center of the parent cell does not work for all DGGS. We could, however, have a way to opt into this, raising an error for e.g. quadrilateral cells. Something like <method_name>(level=10, center_only=True)?

Regarding resolution vs. level, I agree that "level" would be better

We've decided to go with level instead of resolution or refinement_level / reflevel etc. in the meeting, as the context is enough to disambiguate between other kinds of levels. In particular, "processing level" to me is a property of the data / dataset, not the grid (not sure what you're referring to with "maturity level", but if it has to do with the data maturity I would also call that a property of the data or dataset, not the grid). Additionally, if I understand correctly, the draft currently has two names for the kind of level we want to use this for: refinement-level and zone-level, so going with just level allows us to avoid dealing with the subtle differences or the converging of the document on a single term.

As an aside, the OGC draft has a relative-depth property that is defined as an offset between two levels (officially described as "the number of refinement levels separating a finer discrete global grid from a coarser discrete global grid in a discrete global grid hierarchy"). Does that mean depth is also a term we would have had to consider?

I agree that a change_level method would be very much to the point.

As far as I remember, we didn't really discuss the naming for the method I'm proposing here.

While change_level could work, as well, I don't really like the fact that we would repeat the word "level" multiple times in a row: change_level(level=10) or change_level(level) or, even worse, change_level(level=level). regrid_to to me implies interaction with the data (by aggregating / interpolating), so I'd say does not really fit in this context.

For an additional idea: since we've used the word "zoom" in this context, what about zoom_to(level)?