mdbartos / pysheds

:earth_americas: Simple and fast watershed delineation in python.
GNU General Public License v3.0
702 stars 188 forks source link

HydroSHEDs importing error #217

Closed prajwalp closed 4 months ago

prajwalp commented 1 year ago

Hi

I'm new to using pysheds and I need to work with datasets from HydroSHEDs project. I was following the codes on the README file but I am unable to import the void-DEM from https://www.hydrosheds.org/hydrosheds-core-downloads with grid.from_raster()

Here is the error

---------------------------------------------------------------------------

AssertionError                            Traceback (most recent call last)
File ~/miniconda3/lib/python3.8/site-packages/pysheds/sview.py:87, in Raster.__new__(cls, input_array, viewfinder, metadata)
     86 try:
---> 87     assert np.min_scalar_type(viewfinder.nodata) <= obj.dtype
     88 except:

AssertionError: 

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In[55], line 1
----> 1 grid = Grid.from_raster(fi)
      2 dem = grid.read_raster(fi)

File ~/miniconda3/lib/python3.8/site-packages/pysheds/sgrid.py:424, in sGrid.from_raster(cls, data, **kwargs)
    422     return newinstance
    423 elif isinstance(data, str):
--> 424     data = newinstance.read_raster(data, **kwargs)
    425     newinstance.viewfinder = data.viewfinder
    426     return newinstance

File ~/miniconda3/lib/python3.8/site-packages/pysheds/sgrid.py:261, in sGrid.read_raster(self, data, band, window, window_crs, metadata, mask_geometry, **kwargs)
    229 def read_raster(self, data, band=1, window=None, window_crs=None,
    230                 metadata={}, mask_geometry=False, **kwargs):
    231     """
    232     Reads data from a raster file and returns a Raster object.
    233 
   (...)
    259         Raster object containing loaded data.
    260     """
--> 261     return pysheds.io.read_raster(data=data, band=band, window=window,
    262                                   window_crs=window_crs, metadata=metadata,
    263                                   mask_geometry=mask_geometry, **kwargs)

File ~/miniconda3/lib/python3.8/site-packages/pysheds/io.py:148, in read_raster(data, band, window, window_crs, mask_geometry, nodata, metadata, **kwargs)
    146             nodata = data.dtype.type(nodata)
    147 viewfinder = ViewFinder(affine=affine, shape=shape, mask=mask, nodata=nodata, crs=crs)
--> 148 out = Raster(data, viewfinder, metadata=metadata)
    149 return out

File ~/miniconda3/lib/python3.8/site-packages/pysheds/sview.py:89, in Raster.__new__(cls, input_array, viewfinder, metadata)
     87     assert np.min_scalar_type(viewfinder.nodata) <= obj.dtype
     88 except:
---> 89     raise TypeError('`nodata` value not representable in dtype of array.')
     90 # Don't allow original viewfinder and metadata to be modified
     91 viewfinder = viewfinder.copy()

TypeError: `nodata` value not representable in dtype of array.

I am able to import the same file using rasterio's .open() function. I'm trying to find the source of error/how to resolve it but so far, I have been unsuccessful. Any help is appreciated!

PS: The specific dataset I am using to test and learn is n30e110_dem.tif node of Asia DEM data of HydroSHEDs

edsaac commented 1 year ago

I had to re-save the HydroSHEDs DEM using a negative value for the NODATA representation. You can check the value using gdalinfo, for example, this n30w090_dem.tif comes with a NoData Value=32767

>> $ gdalinfo n30w090_dem.tif 
Driver: GTiff/GeoTIFF
Files: n30w090_dem.tif
Size is 12000, 12000
Coordinate System is:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Origin = (-90.000000000000000,40.000000000000000)
Pixel Size = (0.000833333333333,-0.000833333333333)
Metadata:
  AREA_OR_POINT=Area
  DataType=Generic
Image Structure Metadata:
  COMPRESSION=LZW
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  ( -90.0000000,  40.0000000) ( 90d 0' 0.00"W, 40d 0' 0.00"N)
Lower Left  ( -90.0000000,  30.0000000) ( 90d 0' 0.00"W, 30d 0' 0.00"N)
Upper Right ( -80.0000000,  40.0000000) ( 80d 0' 0.00"W, 40d 0' 0.00"N)
Lower Right ( -80.0000000,  30.0000000) ( 80d 0' 0.00"W, 30d 0' 0.00"N)
Center      ( -85.0000000,  35.0000000) ( 85d 0' 0.00"W, 35d 0' 0.00"N)
Band 1 Block=128x128 Type=Int16, ColorInterp=Gray
  Description = n35w085_dem
  NoData Value=32767
  Metadata:
    RepresentationType=THEMATIC
anna24valcarcel commented 6 months ago

I had to re-save the HydroSHEDs DEM using a negative value for the NODATA representation.

Hi @edsaac did you manually replace this value somehow? I am trying to do the same thing, could you possibly attach any code or recommend any software you used for this? Sorry, I am new to TIFF files in general and trying to figure out how to make pysheds package work for my purposes.

anna24valcarcel commented 5 months ago

Hi @prajwalp @edsaac and anyone else having this issue, I figured out how to do this finally.

You do have to change the nodata values but specifically the metadata value for nodata, not the value that appears in the raster.

You can use the following code which will update the metadata value for nodata, rather than values in the raster. You have to do this for pysheds to read the .TIF files from HydroSHEDS, or any other .TIF files like this.

with rasterio.open(tiffname, 'r+') as dataset:
    dataset.nodata = -32767

I believe any negative value will work.

Additionally, you can also edit data values in the raster using the same .open():

dataset = rasterio.open(filename,mode='r+')

# I did something like this:
read_data= dataset .read(1)
for i in range(0,len(read_data)):
     list1 = read_data[i]
     for j in range(0,len(list1)):
         if list1[j] == 32767:
             list1[j] = -32767
     read_data[i] = list1

dataset .write(read_data,1)
dataset .close()

Sorry if that was obvious to everyone but I am new to rasters/.TIF so I did not understand the above solution/process. Thanks.

prajwalp commented 4 months ago

A bit too late for the response, but thanks! I'd solved the issue exactly as you found out as well.