pacificclimate / data-prep-actions

Data Preparation actions - record of ephemera used to prepare data for PCIC data portals and other tools
0 stars 0 forks source link

I58 mouths of watersheds #59

Closed helfy18 closed 2 years ago

helfy18 commented 2 years ago

Add a script that takes outputs the mouths of watersheds used by the web app by using the flow direction from a netCDF file. Resolves #58

corviday commented 2 years ago

This script is returning results different than I expected - every watershed has the same mouth. I'm not sure if this is because

  1. The Freshwater Atlas watersheds are defined in some way that does not line up with our model of how water moves
  2. I did not explain the algorithm to use to find the outlet very well, or
  3. Our flow model is too low-resolution

I'm hoping it's the second option, otherwise we won't be able to build a connection between the Freshwater Atlas polygons and our flow model this way. I need to look into this farther.

I do have a small change to request now: could you please change the third column of the CSV to have the heading WTRSHDGRPC and the value for just that attribute, rather than the entire set of properties values?

helfy18 commented 2 years ago

@corviday I just changed it to only return the code.

corviday commented 2 years ago

Here's my thinking behind using centroid(). With graph paper!

Say we have two watersheds that, in the actual world, look like this. If we had perfect and complete knowledge and no computation limitations, we'd have this extra-squiggly data:

actualwatershed

We don't have perfect information; we have two sets of imperfect information, and we're trying to translate from one to the other. The first imperfect description of the watershed is a set of polygons from the BC Freshwater Atlas. These were pretty high resolution once, but I've had to lower their resolution because they were computationally implausible. Let's say the Frreshwater Atlas says the watersheds look like this, described as polygons:

polygonwatershed

Separately, we also have a imperfect modeled description of the watershed used by PCIC to calculate how water moves across the ground. In this description, land is divided up into grid cells, and water from every grid cell flows downstream to one other grid cell. We can think of the "watershed" as a particular grid cell (the outlet or mouth), plus all the other grid squares who, if we follow the water that rains onto them, eventually that water would pass through the outlet. I've drawn a circle where the mouth of each watershed is, plus outlined the grid cells PCIC understands to be part of this watershed. (And also drawn in the arrows)

rvicwatershed

So what we're trying to solve with this script is to be able to translate from the Freshwater Atlas polygons in the first diagram to the PCIC gridded watersheds in the second diagram. The way we're doing this is:

  1. Pick a point (somehow) inside the Freshwater Atlas Watershed
  2. Follow water from that point downstream using the PCIC model
  3. When the water leaves the Freshwater Atlas Polygon, that's the "mouth"

Here are the two different imperfect representations of the watershed superimposed. They generally agree about where the watersheds are, but neither of them is perfect, and they have simplified the map in different ways, so they do disagree a little on which points are in which watersheds. I've coloured the points they don't agree about green.

both

What happens if we pick one of the disputed green points? Suppose we're trying to determine the translation for Watershed A in the Freshwater Atlas into the PCIC model. We pick a point0 that is inside Watershed A according to the Freshwater Atlas, but it happens to be inside Watershed B according to the PCIC model. We begin trying to trace the water flowing through our point downstream - it flows deeper into Watershed B, and reaches point1.

Now we check and see that point1 is outside Watershed A in the Freshwater Atlas. So we're done tracing the water flow to the mouth. The mouth of Watershed A is set to point0. This is an incorrect answer.

I think all the disputed points will be around the edges of the watershed; I wanted to use the centroid to reduce the chances of this happening.

That was my logic for using the centroid.

Though I didn't think of the fact that for some watersheds the centroid will be outside the watershed entirely, and we'll have the exact same problem. Maybe check if the centroid is outside the watershed, and if it is, use representative_point()?

corviday commented 2 years ago

That said, this may not matter in the real world; @helfy18 reports that all the same mouths were calculated using centroid() and representative_point().

We could perhaps use representative_point() for all watersheds but output an error message if point0 is chosen as the mouth - we shouldn't have any single grid cell watersheds in this dataset, so that would be a good indicator something has gone wrong.

rod-glover commented 2 years ago

That is a truly awesome exposition!

I think I have a more robust solution:

corviday commented 2 years ago

Oh hey, yeah! That sounds perfect! Avoids the dodgy edges of the watershed and the issues of the U-shaped watersheds like the Lower Halfway River!

rod-glover commented 2 years ago

Details on using the suggested solution. Instead of using

center = polygon.centroid

use

center = polygon.buffer(distance).representative_point()

where distance is a negative value whose magnitude somewhat exceeds the (physical distance) size of a flow grid cell.

helfy18 commented 2 years ago

I ran the new version, and it does work better. We now have all 20 watersheds that were being displayed with centroid, and we also have it so that the center point is always contained within the watershed (previously WTRSHDGRPD 171 was having a center point that was not contained in the watershed)