Errors in extraction of thalweg polygons in the Alaska domain

BahramKhazaei-NOAA commented 1 year ago

@cuill After fixing the problem with impi, @SorooshMani-NOAA were trying to run the pyDEM package on a DEM in the Alaska domain which is extracted from GMRT. However, there was a problem with processing the DEM and his pyDEM script failed. We then tried another DEM that has coverage over the Cook Inlet and got another error that seems like is due to existence of missing values in the DEM (it has 'no data' value in some pixels). Here are the error messages in processing of each DEM:

Cook Inlet DEM:

(meshdev) [Soroosh.Mani@compute-0001 Bahram]$ python run_serial.py
./cook_1e7

A Priority-Flood+Epsilon
C Barnes, R., Lehman, C., Mulla, D., 2014. Priority-flood: An optimal depression-filling and watershed-labeling algorithm for digital elevation models. Computers & Geosciences 62, 117–127. doi:10.1016/j.cageo.2013.04.024

c topology = D8
p Setting up boolean flood array matrix...
p Adding cells to the priority queue...
p Performing Priority-Flood+Epsilon...
t succeeded in 2.66116 s
m Cells processed = 36056748
m Cells in pits = 30266840
W W In assigning negligible gradients to depressions, some depressions rose above the surrounding cells. This implies that a larger storage type should be used. The problem occured for 70 of 36056748.
total time=19.3s: ./cook_1e7
finish processing ./cook_1e7
---------compute watershed------------------------------------
---------sort watershed number--------------------------------
computing river network
Traceback (most recent call last):
  File "/lustre/meshing/river/extract/pyDEM_Samples/Bahram/run_serial.py", line 38, in <module>
    gdf = S.write_shapefile(npt_smooth=3)
  File "/home/Soroosh.Mani/sandbox/RiverMeshTools/pyDEM/pyDEM/dem.py", line 2387, in write_shapefile
    yi,xi=self.get_coordinates(SF.data[i])
  File "/home/Soroosh.Mani/sandbox/RiverMeshTools/pyDEM/pyDEM/dem.py", line 2304, in get_coordinates
    indy,indx=unravel_index(sind,ds)
  File "<__array_function__ internals>", line 200, in unravel_index
TypeError: only int indices permitted

GMRT DEM:

(meshdev) [Soroosh.Mani@compute-0001 Bahram]$ python run_serial.py
./gmrt_1e7

A Priority-Flood+Epsilon
C Barnes, R., Lehman, C., Mulla, D., 2014. Priority-flood: An optimal depression-filling and watershed-labeling algorithm for digital elevation models. Computers & Geosciences 62, 117–127. doi:10.1016/j.cageo.2013.04.024

c topology = D8
p Setting up boolean flood array matrix...
p Adding cells to the priority queue...
p Performing Priority-Flood+Epsilon...
t succeeded in 7.31966 s
m Cells processed = 60838806
m Cells in pits = 44421818
W W In assigning negligible gradients to depressions, some depressions rose above the surrounding cells. This implies that a larger storage type should be used. The problem occured for 37223 of 60838806.
total time=36.3s: ./gmrt_1e7
finish processing ./gmrt_1e7
---------compute watershed------------------------------------
---------sort watershed number--------------------------------
computing river network
Traceback (most recent call last):
  File "/lustre/meshing/river/extract/pyDEM_Samples/Bahram/run_serial.py", line 38, in <module>
    gdf = S.write_shapefile(npt_smooth=3)
  File "/home/Soroosh.Mani/sandbox/RiverMeshTools/pyDEM/pyDEM/dem.py", line 2402, in write_shapefile
    idxs=np.squeeze(np.argwhere(np.isnan(river[:,0])))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

These errors occur when using the sample code from pyDEM readme (run_serial.py just with updated paths for our DEMs). He tried to perform some pre-processing on both DEMs but even with fixed DEMs he got the exact same errors.

I will send you the GMRT file (too big to attach here). It would be great if you can take a look at this and see if you have a solution/fix for this.

cuill commented 1 year ago

@BahramKhazaei-NOAA The dem file has "nan" values, which may cause this error because pyDEM takes "-9999" as nodata. The gdal command to change nodata value cannot process this file: gdal_calc.py -A dem_with_ocean.tif --calc="numpy.where(A<-100000, -9999, A)" --outfile dem_with_ocen_nodata.tif --NoDataValue=-9999

Do you have any method to change 'nan' to -9999? @BahramKhazaei-NOAA @SorooshMani-NOAA

cuill commented 1 year ago

@BahramKhazaei-NOAA @SorooshMani-NOAA Find a method to do it: https://gis.stackexchange.com/questions/163685/reclassify-a-raster-value-to-9999-and-set-it-to-the-nodata-value-using-python-a/163705#163705

BahramKhazaei-NOAA commented 1 year ago

@BahramKhazaei-NOAA @SorooshMani-NOAA Find a method to do it: https://gis.stackexchange.com/questions/163685/reclassify-a-raster-value-to-9999-and-set-it-to-the-nodata-value-using-python-a/163705#163705

Thank you. I was trying with QGIS, but it crashed. Did that work for you?

cuill commented 1 year ago

@BahramKhazaei-NOAA Yes, just use the last rule:

# Set large negative values to -9999
#Z[Z<-9999]= -9999
# Choose your preference: (comment either rule)
#Z[Z==-9999]= np.nan
# Or 
Z[np.isnan(Z)]= -9999

SorooshMani-NOAA commented 1 year ago

I'll try this out and let you know if we still see issues. Thanks @cuill

BahramKhazaei-NOAA commented 1 year ago

I'll try this out and let you know if we still see issues. Thanks @cuill

Thanks, Soroosh.

SorooshMani-NOAA commented 1 year ago

@cuill using GMRT DEM, I read the raster values and set all nan's to -9999, but I still get the same error as before:

Traceback (most recent call last):
  File "/lustre/meshing/river/extract/pyDEM_Samples/Bahram/run_serial.py", line 38, in <module>
    gdf = S.write_shapefile(npt_smooth=3)
  File "/home/Soroosh.Mani/sandbox/RiverMeshTools/pyDEM/pyDEM/dem.py", line 2402, in write_shapefile
    idxs=np.squeeze(np.argwhere(np.isnan(river[:,0])))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed```

This is my raster now:

In [5]: with rio.open('./fixed_gmrt.tif', 'r') as ds:
   ...:     v = ds.read(1)
   ...:

In [6]: v
Out[6]:
array([[  -31.025713,   -31.001385,   -30.999046, ...,  1909.2639  ,
         1822.1892  ,  1735.6036  ],
       [  -30.996735,   -31.061066,   -31.002304, ...,  1906.1929  ,
         1839.0338  ,  1778.5045  ],
       [  -30.894072,   -30.888163,   -31.011707, ...,  1883.3633  ,
         1867.0518  ,  1846.1737  ],
       ...,
       [-5467.1333  , -5472.786   , -5457.6455  , ..., -3490.741   ,
        -3495.3052  , -3498.335   ],
       [-9999.      , -9999.      , -9999.      , ..., -9999.      ,
        -9999.      , -9999.      ],
       [-9999.      , -9999.      , -9999.      , ..., -9999.      ,
        -9999.      , -9999.      ]], dtype=float32)

BahramKhazaei-NOAA commented 1 year ago

Thank you Soroosh for testing again. Looks like river[:,0] array is defined as a 2D array in dem.py while a 1D array is passed to the function.

BahramKhazaei-NOAA commented 1 year ago

@cuill, any thoughts on this error message?

cuill commented 1 year ago

@BahramKhazaei-NOAA I just came back from vacation. I'll take a look at this error when I get a chance.

BahramKhazaei-NOAA commented 1 year ago

@BahramKhazaei-NOAA I just came back from vacation. I'll take a look at this error when I get a chance.

Appreciate your help and quick responses as always. Looks like for this error something is not compatible wit the definition of the river array.

cuill commented 1 year ago

@BahramKhazaei-NOAA @SorooshMani-NOAA I cannot reproduce this error but I got another error related to the package GEOS (see issue here https://github.com/shapely/shapely/issues/1345) if using shapely2.0. So, please upgrade GEOS to 3.12.0.

I've pushed the version that worked on my side, please pull it. Here are the thalwegs extracted with a threshold of 1e4.

SorooshMani-NOAA commented 1 year ago

@cuill thanks for your feedback, so you just used the original DEMs and set all the nans to -9999 and ran pyDEM?

cuill commented 1 year ago

@SorooshMani-NOAA correct.

BahramKhazaei-NOAA commented 1 year ago

Hi @cuill I was able to install pyDEM and run the both serial and parellel test cases provided here. However, when I try to run it based on my DEM it gives me the same error as Soroosh. Here is the error:


Traceback (most recent call last):
  File "/work2/noaa/nos-surge/bkhazaei/work/run_dir/Alaska_Modeling/pre_processing/grid/bearing_sea_ocsmesh/pyDEM/run_serial.py", line 47, in <module>
    gdf = S.write_shapefile(npt_smooth=3)
  File "/work2/noaa/nos-surge/bkhazaei/envs/miniconda3/envs/RiverMesh3_10/lib/python3.10/site-packages/pyDEM/dem.py", line 2400, in write_shapefile
    idxs=np.squeeze(np.argwhere(np.isnan(river[:,0])))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

I made sure that the nan values in my DEM are replaced with -9999 as we discussed above. I'm using run_serial.py that is provided for the savannah river test case and didn't change anything in that script except for the path to the DEM file. I'm pasting the script here:

#!/usr/bin/env python3
import sys
import glob
import time

from pylib import *
from pyDEM.dem import *

if __name__ == '__main__':
    #input tiff files
    PathToInputDEMs = '/work2/noaa/nos-surge/bkhazaei/work/run_dir/alaska_domain/mesh/ocsmesh/cook_inlet/DEMs_inputs'
    PathToDomainBoundary = PathToInputDEMs+"/GMRTv4_1_1_20230815topo_NaN_removed.tif"
    print('Path to DEM input:', str(PathToInputDEMs))

    files = glob.glob(PathToDomainBoundary)
    files.sort()

    t0 = time.time()

    for fname in files:
        names = [fname]

        #output filename
        sname = f"./{fname.split('.')[0].split('/')[-1]}_1e7"
        print(sname)

        #declaer a dem object
        S=dem()

        #read data
        if not os.path.exists('{}.npz'.format(sname)):
            S.proc_demfile(names,sname=sname,depth_limit=[-100,1000],subdomain_size=2e10)
        S.read_data('{}.npz'.format(sname))

        #compuate watershed information
        S.compute_watershed()

        #extract river network (area_limit: catchment area)
        S.compute_river(acc_limit=1e7)

        #write shapefile for river network
        gdf = S.write_shapefile(npt_smooth=3)
        gdf.to_file(f'{sname}.shp')
        print(f'It took {(time.time() - t0)/60} mins!')

Did you use the same script to get the result you shared above? If not, could you please share that with me?

cuill commented 1 year ago

@BahramKhazaei-NOAA Yes, I used the same script but with a smaller threshold (acc_limit=1e4). As you can see from the thalwegs above, the threshold of 1e4 has a relatively coarse thalwegs. I think your threshold of 1e7 results in zero thalwegs which causes the error. I will change the script to throw the error if this happens. At the time being, you can test with different threshold.

BahramKhazaei-NOAA commented 1 year ago

@BahramKhazaei-NOAA Yes, I used the same script but with a smaller threshold (acc_limit=1e4). As you can see from the thalwegs above, the threshold of 1e4 has a relatively coarse thalwegs. I think your threshold of 1e7 results in zero thalwegs which causes the error. I will change the script to throw the error if this happens. At the time being, you can test with different threshold.

Thank you for your response. I'm going to try with your recommended acc_limit.

BahramKhazaei-NOAA commented 1 year ago

@cuill I was able to reproduce your results with acc_limit=1e4. Thank you for your help.

BahramKhazaei-NOAA commented 1 year ago

@cuill @feiye-vims I'm running the pyDEM serial script for this GMRT DEM in northern pacific and Bering Sea. It does a good job in extraction of main channels over land and nearshore, yet in most of underwater areas I get these segments of straight lines that doesn't seem to be natural. I tried to play with the parameters in the script (acc_limit and depth limits), however, I couldn't change these patterns. Do you have a solution for this?

cuill commented 1 year ago

@BahramKhazaei-NOAA Normally, we don't need thalwegs in the water areas (e.g., lakes, ponds) to guide meshing. We deleted these straight lines with the pre-defined polygons.

cuill commented 1 year ago

@BahramKhazaei-NOAA It was done in either ArcGIS or QGIS.

BahramKhazaei-NOAA commented 1 year ago

@BahramKhazaei-NOAA Normally, we don't need thalwegs in the water areas (e.g., lakes, ponds) to guide meshing. We deleted these straight lines with the pre-defined polygons.

I see. By pre-defined polygons you mean a polygon that covers the area in which the thalwegs are not needed like the under water areas?

cuill commented 1 year ago

@BahramKhazaei-NOAA No, in the opposite way, a polygon that covers areas where thalwegs are needed. Then after clipping, thalwegs outside the polygon are removed.

BahramKhazaei-NOAA commented 1 year ago

@BahramKhazaei-NOAA No, in the opposite way, a polygon that covers areas where thalwegs are needed. Then after clipping, thalwegs outside the polygon are removed.

Sounds good, thank you for your quick response.

feiye-vims commented 1 year ago

When we created the RiverMeshTool, we already had polygons manually made that roughly delineate the shoreline of the ocean and large lakes/estuaries. These waterbodies are considered well resolved so the thalwegs inside them were clipped out.

The target of the RiverMeshTool are rivers that have a clear direction, so we always cut out lakes. However, even if you retain the thalwegs in the lakes, it won't do much harm for the final mesh.

I would always cut out the ocean part though.

BahramKhazaei-NOAA commented 1 year ago

When we created the RiverMeshTool, we already had polygons manually made that roughly delineate the shoreline of the ocean and large lakes/estuaries. These waterbodies are considered well resolved so the thalwegs inside them were clipped out.

The target of the RiverMeshTool are rivers that have a clear direction, so we always cut out lakes. However, even if you retain the thalwegs in the lakes, it won't do much harm for the final mesh.

I would always cut out the ocean part though.

That is reasonable.

As a side note for future developments: by taking into account the open water areas, the tool can be used to inform mesh generation process for important underwater features like ridges and continental slopes. These features might not be of immediate importance for physical models but could be important for local or biological applications.

feiye-vims commented 1 year ago

Saeed also mentioned this and I agree. I'm taking a note on this for future development.

I'm thinking maybe using the location of largest bathymetry gradient to identify the edge of an underwater dredged channel or ridges.

BahramKhazaei-NOAA commented 1 year ago

Saeed also mentioned this and I agree. I'm taking a note on this for future development.

I'm thinking maybe using the location of largest bathymetry gradient to identify the edge of an underwater dredged channel or ridges.

That is probably the best strategy to start with.

schism-dev / RiverMeshTools

Errors in extraction of thalweg polygons in the Alaska domain #8