mdbartos / pysheds

:earth_americas: Simple and fast watershed delineation in python.
GNU General Public License v3.0
710 stars 195 forks source link

Pixels that should be accumulation watercourses are shown as nodata #249

Closed stev-0 closed 4 months ago

stev-0 commented 6 months ago

I am running a large accumulation over an ~ 15000 x 15000 matrix, with a fairly low-resolution DEM (30 arc seconds) over a large area.

Code is:

grid = Grid.from_raster('/../dem.tif')
dem = grid.read_raster('/../dem.tif')
fixed_dem = grid.fill_pits(dem)
fixed_dem = grid.fill_depressions(fixed_dem)
fixed_dem = grid.resolve_flats(fixed_dem)
fdir = grid.flowdir(fixed_dem)
weights = Grid.from_raster('/../weights.tif')
weights_ras = weights.read_raster('/../weights.tif')
acc = grid.accumulation(fdir, weights=weights_ras)
grid.to_raster(acc, out.tif, dtype=np.float64)

Mostly the accumulation looks really good, aside from a few phantom rivers. But there is also something that looks odd. In the graphic below, It looks like there should be two watercourses draining into the v in the bottom middle of the image (and one of them at least is on natural earth rivers in dashed green). However, those lines (in white) have no data values - as you can see from the red square roughly in the middle and the identity panel from QGIS. I thought perhaps there was some sort of overflow, I was getting lib/python3.11/site-packages/pysheds/sview.py:917: RuntimeWarning: overflow encountered in cast . But I converted the result to float64, which got rid of the error and the input values are all floats and not too huge. Is there anything obvious that could be causing this?

image

stev-0 commented 6 months ago

As an update I have tested a smaller area with the same dataset - that worked as expected. The cell accumulation without weights for the original larger area also worked, and that * 4 to simulate similar values to the weighting values.

Finally I tried converting the weighting raster to 64 bit float as well, for the original dataset, this did not work. So I haven't been able to isolate the problem yet.

stev-0 commented 4 months ago

I think I have solved this particular mystery - this was using worldwide data for pollutant use with some countries not having data available, so some the values were NaN - e.g. for Afganistan, which forms part of this river's (the Indus) catchment. Pysheds didn't seem to ignore this value and routed it down the valley, and it could never get any bigger.

I may have been using the wrong options as I saw some improvement to no data handling in version 0.4 but using that didn't seem to make a difference. Closing this issue as I think it's a non-issue but hopefully this diagnosis might help someone else.