Open joeloskarsson opened 1 month ago
This is now ready for review :partying_face:
In Changelog.md also mention that mesh_node_distance is the new name of grid_refinement_factor.
Yes, that's a good idea. I haven't added any changelog entry yet since I think that we should probably create a new release before merging this, and then the changelog will be updated anyhow and this entry go as the first under unreleased
.
Is Euclidean coordinates or Cartesian coordinates easier to understand? Maybe Euclidean is more accurate regardless...?
Thinking a bit more about this (and reminding myself about the exact definitions), I actually think that we should use Cartesian rather than Euclidean everywhere. My reasoning is that cartesian refers to a coordinate system, whereas euclidean refers to a type of geometric space (with arbitrary coordinate system, e.g. could be spherical coordinates). Now, the surface of the earth is not a euclidean space and lat-long not cartesian coordinates, so there's no real risk of substantial misunderstandings, but I guess "Cartesian coordinates" makes sense whereas "Euclidean coordinates" is technically ambiguous.
@sadamov If you agree with this reasoning I would go ahead and change all comments and names to use Cartesian :smile:
These lat-lon based graphs would benefit a lot from heterogeneous graph design it seems (i.e. not living on a equidistant grid of nodes). This is a topic for a future PR though, right?
Yes, they would definitely benefit a lot from that. But I agree that that's for a future PR. One simple thing I was trying out was to just remove any mesh nodes that does not have any grid node connections. One could do that for the flat Keisler-like graphs, but I realized that it becomes tricky for the multiscale(?) and hierarchical graphs. In order to now have different behavior for the different archetypes I left that out now. Better to revisit and think a bit more about when it is clearly needed.
Note: This PR should have a review by @leifdenby , as by the roadmap. Then it is ready to merge.
I'm in the process of giving this a thorough read through and executing the notebooks etc as I go through. In general this is really excellent (as usual), and I fully agree with all of the following three changes:
I broke out the flat graph creation into a function
create_flat_singlescale_mesh_graph
, as it made more sense that also this type of mesh graph had its own function. This made sense to do here as the amount of related code was growing.I also moved some utility functions around the testing (for creating dummy coordinate arrays) into their own
tests/utils.py
file. This made sense as these functions were duplicated in multiple files before and I had to change them with this new coordinate setup.The argument
grid_refinement_factor
was earlier based on the number of grid nodes in x- and y-directions. As we do no longer assume that these grid nodes lay in regular rows and columns this definition will no longer work. Instead we will have to define the grid refinement in terms of "mesh nodes / distance", using the actual coordinate values. The argument is also being renamed tomesh_node_distance
to match this.
The following I also really like, but I have a question/suggestion wrt it:
- In the high-level functions that are used to create graphs the
xy
argument is renamed tocoords
, as this can be lat-lons or euclidean x- and y-coordinates. If a projection is given this is assumed to be lat-lons, otherwise euclidean. For functions further down (e.g.create_single_level_2d_mesh_graph
), that still take only euclidean coordinates, this argument is still calledxy
.
As I am reading it the intended use in your new implementation has two main use cases (in terms of how the coordinates are defined):
# 1. Using the coordinate values as-is, measuring distance in units of
# coordinate values with the euclidean distance metric. Relevant when working
# with cartesian coordinates.
graph = wmg.create.archetype.create_keisler_graph(
coords,
mesh_node_distance=10, # distance in units of coordinate values, measured as euclidean distance
projection=None,
)
# 2. First projecting the coordinates to a different coordinate system,
# measuring distance in units of the projection with the euclidean distance
# metric. Relevant when working with lat/lon coordinates.
graph = wmg.create.archetype.create_keisler_graph(
coords,
mesh_node_distance=1.0e6, # distance in units of projected coordinates, measured as euclidean distance
projection=ae_projection
)
with the current function signature being:
def create_keisler_graph(
coords: np.ndarray,
mesh_node_distance: float,
projection: Optional[cartopy.crs.CRSj] = None,
) -> nx.Graph:
Where it is assumed that distance is measured with the euclidean distance metric, with either the coordinate values as-is or first projecting.
I think it would be cool to extend this in future by allowing for a different distance metric to be used. That would allow us to create graphs from lat/lon coordinates directly (and thereby have graphs that fully wrap the Earth).
For example we could then do:
# 1. Using the coordinate values as-is, measuring distance in units of
# coordinate values with the euclidean distance metric. This would be useful
# for creating graphs from cartesian coordinates.
graph = wmg.create.archetype.create_keisler_graph(
coords,
mesh_node_distance=10,
projection=None,
distance_metric="euclidean"
)
# 2. First projecting the coordinates to a different coordinate system,
# measuring distance in units of the projection with the euclidean distance
# metric. This would be useful for creating graphs from lat/lons for limited
# areas.
graph = wmg.create.archetype.create_keisler_graph(
coords,
mesh_node_distance=mesh_distance,
projection=ae_projection,
distance_metric="euclidean"
)
# 3. Using the coordinate values as-is, measuring distance in great circle
# distance with the haversine distance metric. This would be useful for
# creating graphs from lat/lon coordinates directly. We should default to the
# distance being in SI units (as in meters) and so use a fixed Earth radius in
# meters.
graph = wmg.create.archetype.create_keisler_graph(
coords,
mesh_node_distance=1.0e4, # distance of 10 km between coordinates given in lat/lon
projection=None,
distance_metric="haversine"
)
With the function signature changed to:
def create_keisler_graph(
coords: np.ndarray,
mesh_node_distance: float,
projection: Optional[cartopy.crs.CRS] = None,
distance_metric: str = "euclidean"
) -> nx.Graph:
So that by default the function will use euclidean distance with the coordinate values, but the metric can be changed to haversine (which gives approximate great circle distances in meters between lat/lon coordinates, assuming a fixed Earth radius).
All that is then required is to pass the distance_metric
argument down the function calls and where the actual distance is calculated and used implement the logic for the different distance metrics.
What do you think of this idea? If it sounds good to you then maybe we could go ahead and add distance_metric
as an argument to the function (with default value euclidean
) to make the metric used explicit, and then that would make it easy to add the haversine distance metric in the future.
Of course one could argue that it should be possible to provide the Earth radius, but maybe that could become an optional metric_argument
down the line. Or we allow people to optionally provide a projection that makes it clear that the coordinates are lat/lon and in that projection object they can then set the globe radius (e.g. with cartopy.crs.PlateCarree(globe=cartopy.crs.Globe(semimajor_axis=..., semiminor_axis=...))
or something like that)
Describe your changes
This PR allows for more flexibility in the coordinates used to create graphs. They do no longer need to be on a regular $N_x \times N_y$ grid, but can be at arbitrary positions. This means that the main coordinate input no longer has the shape
[2, Ny, Nx]
, but instead[N_grid_points, 2]
with each column representing a coordinate for all grid points. This makes it easy to also allow for lat-lon input coordinates, and use a specified projection to map these to a euclidean space.To get an overview of the lat-lon functionality I recommend checking out the new page I added to the documentation.
The motivation for this change is to allow more flexibility in the regions graphs are built for. This is in particular useful when we consider grid points as the union of some LAM-region and a boundary of global grid-points, possibly on different gridding. With this change such cases can be handled without issues.
There are a few noteworthy changes that comes with this feature:
create_flat_singlescale_mesh_graph
, as it made more sense that also this type of mesh graph had its own function. This made sense to do here as the amount of related code was growing.tests/utils.py
file. This made sense as these functions were duplicated in multiple files before and I had to change them with this new coordinate setup.grid_refinement_factor
was earlier based on the number of grid nodes in x- and y-directions. As we do no longer assume that these grid nodes lay in regular rows and columns this definition will no longer work. Instead we will have to define the grid refinement in terms of "mesh nodes / distance", using the actual coordinate values. The argument is also being renamed tomesh_node_distance
to match this.xy
argument is renamed tocoords
, as this can be lat-lons or euclidean x- and y-coordinates. If a projection is given this is assumed to be lat-lons, otherwise euclidean. For functions further down (e.g.create_single_level_2d_mesh_graph
), that still take only euclidean coordinates, this argument is still calledxy
.New dependencies
This PR introduces a dependency to cartopy, as that is needed for the projection specification when working with lat-lons.
Merge and release planning
It would be good if this change went in after #26 as both deal with the refinement factor calculations.(Now merged)Issue Link
This PR closes #29
Type of change
Checklist before requesting a review
pull
with--rebase
option if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
Checklist for assignee