mdbartos / pysheds

:earth_americas: Simple and fast watershed delineation in python.
GNU General Public License v3.0
708 stars 191 forks source link

negative z-values on import #173

Closed fluviotect closed 2 years ago

fluviotect commented 2 years ago

Hello,

I'm keen to have a play with pysheds, but am having an issue with some data.

Though it displays fine in QGIS and ARC, the nodata value appears to be skewing the data once in pysheds.

image

The null region plots as a large magnitude negative number on import to pysheds...presumably because the 'no data' value = -3.4028230607370965e+38

here's the metadata (per rasterio) for this data source:

{'driver': 'GTiff', 'dtype': 'float32', 'nodata': -3.4028230607370965e+38, 'width': 5834, 'height': 8329, 'count': 1, 'crs': CRS.from_epsg(2193), 'transform': Affine(4.999719232087755, 0.0, 1799748.6469, 0.0, -4.9998783407372125, 5492807.9139)}

A different data source in the same projection and general area plots OK:

image

In this second data source, the no data values are different and automatically handled by pysheds.

{'driver': 'GTiff', 'dtype': 'float32', 'nodata': None, 'width': 32768, 'height': 24576, 'count': 1, 'crs': CRS.from_epsg(2193), 'transform': Affine(8.0, 0.0, 1703936.0, 0.0, -8.0, 5570560.0)}

C:\Users\Will\miniconda3\envs\CatchDelin\lib\site-packages\pysheds\io.py:142: UserWarning: No nodata value detected. Defaulting to 0. warnings.warn('No nodata value detected. Defaulting to 0.')

Is there an argument or setting within pysheds to accommodate a nodata value other than None? or do I need to handle prior to import to pysheds?

Cheers, Will

mdbartos commented 2 years ago

Hi @fluviotect, the grid.view method is where you can control the output view of your dataset. You can use it to both control the spatial extent of the view, as well as the nodata value for the output. Here's a quick example of the behavior you want using the example data in the ./data directory:

import numpy as np
from pysheds.grid import Grid

# Load data
grid = Grid.from_raster('../data/dem.tif')
dem = grid.read_raster('../data/dem.tif')

# Set edge cells to nodata value
dem[0, :] = dem.nodata
dem[-1, :] = dem.nodata
dem[:, 0] = dem.nodata
dem[:, -1] = dem.nodata

Then, if we output dem:

>>> dem

Raster([[-32768, -32768, -32768, ..., -32768, -32768, -32768],
        [-32768,    210,    207, ...,    176,    176, -32768],
        [-32768,    209,    204, ...,    174,    174, -32768],
        ...,
        [-32768,    262,    263, ...,    217,    217, -32768],
        [-32768,    265,    265, ...,    217,    217, -32768],
        [-32768, -32768, -32768, ..., -32768, -32768, -32768]],
       dtype=int16)

Instead, let's have grid.view mask these values with nan in the output view:

# Set the mask of the grid to mask out the nodata cells
grid.mask = np.where(dem == dem.nodata, False, True)

# Call grid.view with the nodata keyword argument
grid.view(dem, nodata=np.nan)

And the output will return:

>>> grid.view(dem, nodata=np.nan)

Raster([[ nan,  nan,  nan, ...,  nan,  nan,  nan],
        [ nan, 210., 207., ..., 176., 176.,  nan],
        [ nan, 209., 204., ..., 174., 174.,  nan],
        ...,
        [ nan, 262., 263., ..., 217., 217.,  nan],
        [ nan, 265., 265., ..., 217., 217.,  nan],
        [ nan,  nan,  nan, ...,  nan,  nan,  nan]], dtype=float32)

Which looks better for plotting. Note that the dtype was upcast to represent nans.

mdbartos commented 2 years ago

Regarding the second issue, you can always set the nodata value of the Raster after instantiation:

dem.nodata = -9999

The nodata value has to be representable as a numpy numeric or boolean type, though.

fluviotect commented 2 years ago

Cheers!

My exploration was prompted by an error I was getting during flow direction processing re: negative values. So are negative integer values acceptable to flowdir as long as they are numpy numeric or boolean?

fluviotect commented 2 years ago

this fixed my problem:

dem.nodata = np.nan

thanks again

mdbartos commented 2 years ago

My exploration was prompted by an error I was getting during flow direction processing re: negative values. So are negative integer values acceptable to flowdir as long as they are numpy numeric or boolean?

Yes, negative integer or floating point values are acceptable. Every cell value is treated literally as an elevation, unless it is a nodata value, in which case it is set to the maximum dem elevation +1 to prevent any cell from flowing to it:

https://github.com/mdbartos/pysheds/blob/9d8896080fdf9d9af88a57f203177ba1a365946c/pysheds/sgrid.py#L610-L611

Note that masked cells will also be converted to nodata internally before reaching this line. (See the arguments apply_input_mask and apply_output_mask in grid.view for more info).