pacificclimate / climate-explorer-backend

4 stars 1 forks source link

add an API endpoint that provides streamflow connectivitiy upstream of a given point #198

Closed corviday closed 2 years ago

corviday commented 2 years ago

Currently the "watershed" API accepts a point in the watershed, calculates all the grid squares upstream of that point, and returns a geoJSON Polygon representing the outer boundary of all the grid squares upstream from a selected point. This polygon can then be used by the front end to query a different API about climate data within a watershed.

With the salmon project, we are not just interested in average climate data across a watershed, but in how the streams in a watershed interconnect, which may help scientists understand "choke points" the salmon have to pass through on the way to upstream spawning grounds. We need a new stream network API that, similarly to the watershed API, takes a point and returns data about the squares "upstream" of that point, but instead of the outline of the entire collection of squares, it needs to return a graph description of the squares showing how they are interconnected by the movement of water.

The new API should accept aPointin WKT format, and the name of the ensemble containing the flow map.

There are many possibilities for describing the flow network graph in JSON format. One possibility is:

{
  "graph": {
    "nodes": [
      {
        "latitude": 50,
        "longitude": 123,
      },
      {
        "latitude": 51,
        "longitude": 123
      },
    ],
    "edges": [
      [1, 0],
    ]
  }
}

Which would mean there were two grid cells in this watershed, one at (50, 123) and one at (51, 123) and that the more northern one (node 1) flowed into the more southern one (node 0). The format does not have to be identical to this - there are lots of options - but it should be straightforward enough for the front end to easily draw the network from whatever JSON it receives from this API.

EDIT: return GeoJSON instead, as suggested by @jameshiebert below.

This output will be used by the web frontend to produce a map that looks something like this:

flowmap(1)

This API will do a lot of the same calculations as the watershed API - both of them trace through the flow network netCDF to find all upstream points, before returning information on those points. It will probably make sense to move some functions from the watershed API into a stream flow library file that both APIs can draw from.

jameshiebert commented 2 years ago

@corviday I'd propose that we try to use a standardized format for the watershed that this API call returns. For example, a WKT MultiLineString or a GeoJSON Multiline String. Not sure if either of these have styling options built into them (i.e. for whatever flow or stream temperature attributes we'll eventually use). But for now, using one of these will help us at least get the network lines on the map.

corviday commented 2 years ago

The response format suggested was designed around the idea that it would be easy to expand with data values when we got to that point: you'd just add variable and value attributes to each node entry. But it does make sense to just return a GeoJSON Multiline as a start.

jameshiebert commented 2 years ago

Some proposed steps for approaching this problem:

  1. Find the example flow network data (in the tests).
    1. Use NetCDF binary tools on the command line to visualize and dump its output to the terminal
      1. Use the Python NetCDF library to access the flow network file.. Review the first half of this tutorial, focusing on ways to read the data.
  2. Find the watershed API code, particularly the code that computes all cells in the upstream area.
  3. Work on a new upstream search function which collects the cells in a different manner. Instead of collecting a set of cells all grouped together, you'll want to collect a set of individual streams. You'll have to work out how to do this... maybe start a new substream every time you detect a branch point?
  4. The upstream area function converts a set of cells into a polygon (the result of a union of boxes. You'll want to convert each of your individual streams into a linestring and then put them altogether into a MultiLineString.
  5. Add your new API function to the list of methods available
  6. Write some automated tests for your new API function.
jameshiebert commented 2 years ago

@corviday It's possible that we can bolt styling on afterward (even if it's not part of the standard). Here's an old question asking about this and it appears that LeafletJS supports style properites.

corviday commented 2 years ago

Nice, that's a much better solution.

rod-glover commented 2 years ago

Hi folks. Sorry I'm a bit late to the party, but I have questions about the goal here. This is in the service of reviewing Johnathan's PR for this issue.

I discussed with Lee what the end goal is, which is basically to render a diagram like the one above. To make my understanding explicit:

  1. Given a location, every cell in the upstream and downstream watersheds should be included in the diagram.
  2. The diagram shows a line segment connecting each cell to its (downstream) neighbour according to the connectivity (flow direction) matrix.
  3. The line segments are nominally -- as shown in the example diagram -- smaller than the cells, so they look spindly like a tree rather than just a raster. It's an alternative (and easier to read) visualization than an arrow in a cell pointing at its downstream neighbour.

@corviday do I have this correctly?

From this I conclude:

  1. What we need to render is simply a line segment (length 2 linestring) for every pair of connected cells. They don't need to be arranged in any particular order, and there is no notion of creating linestrings greater than length 2.
  2. If so, then we are:
    1. Relieving the frontend app of having to convert a more compact representation to a set of line segments to render.
    2. Therefore encoding the location of cell (center) as many times as it is the downstream neighbour of another cell. Plus one for its upstream encoding. So looking at a redundancy of about 2.
  3. Aside: A viewer can infer the flow direction from the line segments if you have the whole context in view. Zoomed in on a a small area, you can't unambiguously infer flow direction unless an arrow or other orientation device is part of the line segment rendering. Therefore ordering of points in line segment (linestring) needs to be consistent throughout.
  4. Aside: When zoomed sufficiently far out, this diagram would look like a raster when the width of a line segment stroke is greater than or equal to the rendered dimensions of a cell. Seems unlikely to happen, but it's possible.

@jameshiebert , is conclusion 1 what you were thinking of when you mentioned rendering the result as linestrings? And hence 2?

jameshiebert commented 2 years ago

@rod-glover I think that you understand the problem correctly with the exception that the downstream portion of this problem is currently being solved in another issue. This issue only deals with the upstream.

I hadn't thought to represent the upstream as a series of unordered, length-2 linestrings. But that would certainly work! (Possibly better than what I had in mind, which was a series of linestrings that spanned between each point of branching).

rod-glover commented 2 years ago

@jameshiebert , I think finding the longer linestrings between branching points is a harder (though more interesting) problem to solve, and therefore we probably go with the simpler (length-2) solution.

rod-glover commented 2 years ago

@helfy18 , as far as what should be output, I think it should be a MultiLineString, with each LineString inside that of length 2. That's the only way in GeoJSON to capture a "set" of LineStrings.

helfy18 commented 2 years ago

@rod-glover the MultiLineString is ordered though, is it not? What is it that makes the length-2 linestrings better than the ones spanning the entire stream? Wouldn't it be harder to colour coat the length 2 than the longer linestrings? Is it that each stream is not being color coated? I might have misunderstood the picture, not that I look at it again

rod-glover commented 2 years ago

MultiLineString is ordered. It seems to me that finding the longer strings is harder; I couldn't immediately see how/why the code was doing that, but if you think it's simple, then it's OK -- and I'd like to see an explanation of how the algorithm works.

I'm not sure about colour coding. That's a good question.

corviday commented 2 years ago

@rod-glover the MultiLineString is ordered though, is it not? What is it that makes the length-2 linestrings better than the ones spanning the entire stream? Wouldn't it be harder to colour coat the length 2 than the longer linestrings? Is it that each stream is not being color coated? I might have misunderstood the picture, not that I look at it again

The colour coding on the strings will - someday - be derived from a gridded data file that has water temperature, or flow volume, at each location on the grid, but that file doesn't know about the stream network. So the colours, when we have them, will be assigned by individual grid values matching points along a linestring, but I don't think shorter linestrings will make it easier or harder, since the colour is just a series of points with no network information.

rod-glover commented 2 years ago

Thanks for that, @corviday . I was just about to ask about it. Given that information, I have the following comments:

TL;DR: I think the earlier "pure graph" representation is a better fit for our goals here. Here's why:

  1. The overall task can be described as follows:
    1. Given a target node, collect all the upstream watershed nodes. They form a subgraph (subtree) of the full flow graph.
    2. Create a description of the upstream graph, namely those nodes and the connections between them.
    3. Eventually, add information to that graph description that helps render it more informatively -- distance, flow, temperature, salmon count, etc.
  2. 1.1 and 1.2 could be served by using GeoJSON MulitLineStrings, of whatever type.
  3. I don't think GeoJSON linestrings (of whatever length) are a particularly useful representation for 1.3.
    1. We're going to have to pick the linestrings apart and assign colours later, based on other information -- defined how and where?
    2. We're also going to end up specifying nodes/edges redundantly, and have a "join" problem to solve to adjoin the linestrings with the additional rendering information. But ...
  4. The original "pure graph", plain JSON representation is more useful for 1.3 and no less useful for 1.1 and 1.2.
    1. It allows annotating nodes and/or edges with arbitrary information -- namely that which would be used to colour or otherwise annotate the rendered lines. It's very flexible in that way.
    2. The task of rendering the pure graph representation on the client side is easy. Leaflet (and React Leaflet) contain native polylines (L, RL) (trivial case: length-2 linestring) that would be trivial to assemble from the pure graph representation. And very flexible, since they don't depend on GeoJSON, which introduces a lot of limitations.
rod-glover commented 2 years ago

@jameshiebert , what do you think?

@corviday , if we choose the pure graph representation, then your work switches from extending the GeoJSON component to handle MultiLineStrings to building a component that will render a bunch of Leaflet polylines (of length 2) based on the pure graph input. That's pretty much the same level of effort as the GeoJSON mod.

rod-glover commented 2 years ago

One other thing: Whatever we do here, it will be important to tell both the users of the API and of the client (map) that the lines we draw aren't necessarily actual streams and rivers, just where the water goes. Some of that will be just broad general flow over (maybe under) the surface, some will be actual streams.

jameshiebert commented 2 years ago

@rod-glover and @helfy18 I think that to keep the scope constrained enough to be accomplishable, the goal of this issue is to represent the geometry only (not styling) of an upstream flow in whatever method is possible. "pure graph" representation, as I originally imagined, is an option. A set of 2-node linestrings is another option. The former is more representative of the structure. However, the latter will be easier to adapt when we (later!) want to style according to attributes like flow/temp/etc. (i.e. the styling will definitely be per 2-node segment and not per multi-node reach.

TL;DR do something that we can implement now and we can always revise it as we continue to work on these problems.

rod-glover commented 2 years ago

the latter will be easier to adapt when we (later!) want to style according to attributes like flow/temp/etc.

I disagree. I think it equally hard to style GeoJSON. However, I accept that this is the decision.

GeoJSON only supports additional information ("properties") at the level of feature, not of geometry. (So it is not surprising that Leaflet can style GeoJSON only at the level of feature.)

Therefore, to accommodate styling per 2-node segment, the GeoJSON returned from this endpoint will need to be a FeatureCollection, containing one Feature per length-2 segment. To be more specific, the GeoJSON will need to be structured like as follows. The "properties" property is arbitrary and can be omitted until we have something to put in it.

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "LineString",
        "coordinates": [[102.0, 0.5], [102.5, 0.7]]
      },
      "properties": {
        "flow": 99,
        "temperature": 12
      }
    },
    {
      "type": "Feature",
      "geometry": {
        "type": "LineString",
        "coordinates": [[103.0, 0.5], [103.5, 0.7]]
      },
      "properties": {
        "flow": 88,
        "temperature": 12
      }
    },
    // ...
  ]
}
rod-glover commented 2 years ago

In a conversation with James yesterday, I realized I was introducing too many diverse topics into this issue. I've pulled out all the relevant content into a GitHub Discussion. Discussions are a new thing in GitHub, and they are intended, it appears, for just this kind of situation.

For now, as James says, we can return GeoJSON (of whatever form) and modify as needed.