[Suggestion] Add tips for default parameters in `resolve_flats` to documentation?

JamesSample commented 2 years ago

A suggestion rather than an issue.

I've recently been using PySheds to delineate watersheds on some fairly large grids (a few hundred million to around a billion cells). The generated catchment boundaries initially seemed poor, because despite filling pits & depressions and resolving flats, there were still some pits and flats in my flow direction raster (i.e. values of -2 and -1), which "captured" streamflow. This was caused because applying resolve_flats with the default parameters created new pits, and then applying fill_pits again created more flats, etc.

resolve_flats has parameters max_iter and eps, which default to 1000 and 1e-5, respectively. In the end, I solved my problem by setting max_iter=1e9 and eps=1e-12. I haven't noticed a significant reduction in performance after doing this and the results are much better.

Just thought it worth mentioning in case it helps someone else :-)

mdbartos commented 2 years ago

Great, thanks for identifying this issue. Feel free to submit a PR if you have any recommendations for how this issue should be addressed.

dankovacek commented 2 years ago

First of all, thanks for your work on this library, it's inspiring.

I tested this out on a large region in the Columbia basin (BC/Washington) with several very long, narrow lakes formed by dams. I found the default max_iter and eps parameters introduced a sort of edge mid-lake, causing an interruption in the delineation:

Using the modified parameters (1E9/1E-9) avoided this issue, and I guess the eps value could be detected based on the longest dimension of flat areas detected? It's easier to just pick a small number, but it might be more useful to determine if the flat areas detected are too big for the precision / dtype? Anyway, I'm still thinking about how / where this could be best addressed.

I'm trying to compare the methodology described in Barnes et. al. (2015) to the Pysheds implementation and from what I can tell, the paper suggests the algorithm may not require modifying the DEM. It sounds like it may be possible to modify just the flow direction, and in so doing so avoid setting an eps parameter, avoid floating point issues, etc. (Section 2.1). This looks like it could be a starting point to make a contribution, are there any obvious issues I should be aware of? I'm wondering if it is equivalent to modify the dem vs. modify the flow direction? Modifying the dem at least allows you to easily visualize what was modified and by how much.

mdbartos commented 2 years ago

Hi Dan,

Thanks for pointing this out. You are correct. It should be possible to construct the drainage gradient for each flat using the method of Barnes and then directly apply the flow direction function to this constructed drainage gradient. Then you should be able to add the flow direction grid for the flats together with the flow direction grid for the rest of the raster.

I tried this approach but I wasn't able to get it to route flow correctly. But I believe it should work. The code I tried was as follows:

import pyproj
from pysheds.grid import Grid
from pysheds.sview import Raster
import numpy as np
import matplotlib.pyplot as plt

# Import data
grid = Grid.from_raster('../data/n30w100_con')
dem = grid.read_raster('../data/n30w100_con')
fdir = grid.read_raster('../data/n30w100_dir')
dirmap = (64,  128,  1,   2,    4,   8,    16,  32)

# Import variables for resolve_flats function
self = grid
import pysheds._sgrid as _self
import skimage.measure
nodata_out = None
eps = 1e-5
kwargs = {}
data = dem
max_iter = 1000

# Resolve flats function
input_overrides = {'dtype' : np.float64}
kwargs.update(input_overrides)
dem = self._input_handler(data, **kwargs)
# Find no data cells
# TODO: Should these be used?
nodata_cells = self._get_nodata_cells(dem)
# Get inside indices
inside = np.arange(dem.size, dtype=np.int64).reshape(dem.shape)[1:-1, 1:-1].ravel()
# Find (i) cells in flats, (ii) cells with flow directions defined
# and (iii) cells with at least one higher neighbor
flats, fdirs_defined, higher_cells = _self._par_get_candidates_numba(dem, inside)
# Label all flats
labels, numlabels = skimage.measure.label(flats, return_num=True)
# Get high-edge cells
hec = _self._par_get_high_edge_cells_numba(inside, fdirs_defined, higher_cells, labels)
# Get low-edge cells
lec = _self._par_get_low_edge_cells_numba(inside, dem, fdirs_defined, labels, numlabels)
# Construct gradient from higher terrain
grad_from_higher = _self._grad_from_higher_numba(hec, flats, labels, numlabels, max_iter)
# Construct gradient towards lower terrain
grad_towards_lower = _self._grad_towards_lower_numba(lec, flats, dem, max_iter)
# Construct a gradient that is guaranteed to drain
new_drainage_grad = (2 * grad_towards_lower + grad_from_higher)

And this part is new:

# Create raster from gradient array
grad = new_drainage_grad.copy()
grad_mask = (grad == 0)
grad[grad_mask] = grad.max() + 1
grad = Raster(grad.astype(np.float64), viewfinder=grid.viewfinder)

# Create flow direction grid for everything other than flats
fdir_main = grid.flowdir(dem)
# Create flow direction grid for gradient imposed on flats
fdir_flats = grid.flowdir(grad)

# Construct full flow direction grid
fdir = Raster(np.where(flats, fdir_flats, fdir_main),
                viewfinder=fdir.viewfinder)

Maybe you could take a crack at it and figure out what's going wrong.

dankovacek commented 2 years ago

Thanks for your response Matt.

I tried setting up your suggestion from above, and the gradient assignment in flat regions behaves as expected on the (~110km) long lake showing a clear north-south gradient:

In the last step where grad_mask is created, I at first thought zero gradient corresponded to flat areas, but it seems to be the inverse:

grad_mask = (grad == 0)
grad[grad_mask] = grad.max() + 1

The cells corresponding to zero gradient at this step are outside of flat areas identified in _par_get_candidates_numba as (~fdir_defined & ~is_pit) or ~(diff > 0) & ~(diff < 0) ≡ (diff == 0)), and these are set to be greater than the maximum gradient value to avoid affecting flow directions at edges bounding flat areas:

Looking at the adjusted gradient figure above, there are a lot of dark spots that look like pits. I thought maybe the adjustment wasn't being applied in these small areas, but if I change grad_mask to the flats boolean, they show up in yellow meaning they are included in the set of flats:

My next thought is to better understand how queues are constructed in the grad_from_higher_numba and grad_towards_lower_numba. I can't tell if it matters how cells from distinct flat areas are added to the queue and if this affects local gradients. It's interesting to see how these sparse, small flat areas are critical to hydraulic connectivity.

mdbartos commented 2 years ago

Hi Dan, one suggestion to look into:

The resolve_flats algorithm generally works OK when directly raising the elevations of the DEM cells, given a sufficiently small epsilon.

My guess is the problem lies at the interface between the non-flats and the flats---in particular, how the flow direction is established from the non-flat cells to the high-edge cells inside the flats. Might be worth looking into.

lar84 commented 2 years ago

Had similar issues as Dan and James. resolve_flats was leaving (creating?) some pits in elongated flat areas. Bumping up the max_iter and eps solved my problems. Thanks all!

mdbartos / pysheds

[Suggestion] Add tips for default parameters in `resolve_flats` to documentation? #188