r-barnes / richdem

High-performance Terrain and Hydrology Analysis
GNU General Public License v3.0
270 stars 68 forks source link

failed assertion in parallel_d8_accum #4

Open theobarnhart-USGS opened 6 years ago

theobarnhart-USGS commented 6 years ago

Hi Rich,

Thank you for helping me with the other problem. Now when I try to run parallel_d8_accum it produces the following error and I've gdalinfoed the file I'm using below as well. Thank you for all your help!


(richDEM) tbarnhart@parallel_d8_accum$ mpirun -n 2 ./parallel_d8_accum.exe one @evict ~/projects/DEM_processing/data/0_0_richDEM_fill.tiff ~/projects/DEM_processing/da
ta/%n_richDEM_accum.tiff
c Program name       = RichDEM v0.0.0
c Script compiled at = 2018-01-04 15:21:55 UTC
c Git hash           = 8d45cc88d1d65384
c Copyright          = Richard Barnes © 2016
a Analysis command   = ./parallel_d8_accum.exe one @evict /home/tbarnhart/projects/DEM_processing/data/0_0_richDEM_fill.tiff /home/tbarnhart/projects/DEM_processing/da
ta/%n_richDEM_accum.tiff
A Parallel Flow Accumulation
C Barnes, R. 2016. "Parallel D8 Flow Accumulation For Trillion Cell Digital Elevation Models On Desktops Or Clusters". In-progress.
c Processes = 2
c Many or one = one
c Input file = /home/tbarnhart/projects/DEM_processing/data/0_0_richDEM_fill.tiff
c Retention strategy = @evict
c Block width = -1
c Block height = -1
c Flip horizontal = 0
c Flip vertical = 0
c Cache compression = FALSE
m Total width =  2401
m Total height = 3601
m Block width =  2401
m Block height = 3601
m Total cells to be processed = 8646001
t Preparer time = 0.008008 s
c Input data type = Int16
m Jobs created = 1
p Jobs remaining = 0
parallel_d8_accum.exe: ../../include/richdem/common/Array2D.hpp:528: richdem::Array2D<T>::i_t richdem::Array2D<T>::getN(richdem::Array2D<T>::i_t, uint8_t) const [with
T = unsigned char; richdem::Array2D<T>::i_t = unsigned int; uint8_t = unsigned char]: Assertion `0<=n && n<=8' failed.
[IGSKKBCWLT762:08598] *** Process received signal ***
[IGSKKBCWLT762:08598] Signal: Aborted (6)
[IGSKKBCWLT762:08598] Signal code:  (-6)
[IGSKKBCWLT762:08598] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330) [0x7fd19e7b0330]
[IGSKKBCWLT762:08598] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7fd19e406c37]
[IGSKKBCWLT762:08598] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7fd19e40a028]
[IGSKKBCWLT762:08598] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2fbf6) [0x7fd19e3ffbf6]
[IGSKKBCWLT762:08598] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2fca2) [0x7fd19e3ffca2]
[IGSKKBCWLT762:08598] [ 5] ./parallel_d8_accum.exe() [0x40d878]
[IGSKKBCWLT762:08598] [ 6] ./parallel_d8_accum.exe() [0x430fff]
[IGSKKBCWLT762:08598] [ 7] ./parallel_d8_accum.exe() [0x436fdc]
[IGSKKBCWLT762:08598] [ 8] ./parallel_d8_accum.exe() [0x439b92]
[IGSKKBCWLT762:08598] [ 9] ./parallel_d8_accum.exe() [0x40c83a]
[IGSKKBCWLT762:08598] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fd19e3f1f45]
[IGSKKBCWLT762:08598] [11] ./parallel_d8_accum.exe() [0x40d68b]
[IGSKKBCWLT762:08598] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 8598 on node XXXXX exited on signal 6 (Aborted).
--------------------------------------------------------------------------

Output from gdalinfo for the file I'm trying to run the accumulation process on is:

(richDEM) tbarnhart@parallel_d8_accum$ gdalinfo ~/projects/DEM_processing/data/0_0_richDEM_fill.tiff
Driver: GTiff/GeoTIFF
Files: /home/tbarnhart/projects/DEM_processing/data/0_0_richDEM_fill.tiff
Size is 2401, 3601
Coordinate System is:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0],
    UNIT["degree",0.0174532925199433],
    AUTHORITY["EPSG","4326"]]
Origin = (-113.000416666666666,47.000416666666666)
Pixel Size = (0.000833333333333,-0.000833333333333)
Metadata:
  AREA_OR_POINT=Area
  PROCESSING_HISTORY=2018-01-04 15:32:34 UTC | RichDEM v0.0.0 (hash=8d45cc88d1d65384, compiled=2018-01-04 15:22:25 UTC) | ./parallel_pf.exe one @evict /home/tbarnhart/
projects/DEM_processing/data/test_data.tiff /home/tbarnhart/projects/DEM_processing/data/%n_richDEM_fill.tiff
  TIFFTAG_DATETIME=2018-01-04 15:32:34 UTC
  TIFFTAG_SOFTWARE=RichDEM v0.0.0 (hash=8d45cc88d1d65384, compiled=2018-01-04 15:22:25 UTC)
Image Structure Metadata:
  INTERLEAVE=BAND
Corner Coordinates:
Upper Left  (-113.0004167,  47.0004167) (113d 0' 1.50"W, 47d 0' 1.50"N)
Lower Left  (-113.0004167,  43.9995833) (113d 0' 1.50"W, 43d59'58.50"N)
Upper Right (-110.9995833,  47.0004167) (110d59'58.50"W, 47d 0' 1.50"N)
Lower Right (-110.9995833,  43.9995833) (110d59'58.50"W, 43d59'58.50"N)
Center      (-112.0000000,  45.5000000) (112d 0' 0.00"W, 45d30' 0.00"N)
Band 1 Block=2401x1 Type=Int16, ColorInterp=Gray
  NoData Value=-32767
r-barnes commented 6 years ago

@theobarnhart-USGS Sorry about that. I'm a little busy this week, but will look into it as soon as I can.

In the meantime, I've reformatted your post so that the outputs are appropriately formatted. You can do this yourself by highlighting the outputs as pressing the <> button in the formatting bar, which will convert the outputs to "code" in a way which makes them much easier to read.

r-barnes commented 6 years ago

This probably has something to do with the file having the Int16 data type.

I should be able to fix this.

In the meantime, do you experience problems on Int32 or Float32/Double32 data?

johnniehard commented 5 years ago

I'm getting the same errors, with the exception that it seems to be running some of the jobs?

mpirun -n 4 ./parallel_d8_accum.exe one @evict ~/kod/flodesapp/localdata/geodata/DEM_LM/riks_50m_nowater.tif ./out/%n.tif --bwidth 500 --bheight 500
c Program name       = RichDEM v2.2.9
c Script compiled at = 2019-10-09 11:09:19 UTC
c Git hash           = abc04d81216d7cf5
c Copyright          = Richard Barnes © 2018
a Analysis command   = ./parallel_d8_accum.exe one @evict /home/johnnie/kod/flodesapp/localdata/geodata/DEM_LM/riks_50m_nowater.tif ./out/%n.tif --bwidth 500 --bheight 500 
A Barnes (2017) Parallel Non-divergent Flow Accumulation
C Barnes, R., 2017. Parallel non-divergent flow accumulation for trillion cell digital elevation models on desktops or clusters. Environmental Modelling & Software 92, 202-212. doi:10.1016/j.envsoft.2017.02.022
c Processes = 4
c Many or one = one
c Input file = /home/johnnie/kod/flodesapp/localdata/geodata/DEM_LM/riks_50m_nowater.tif
c Retention strategy = @evict
c Block width = 500
c Block height = 500
c Flip horizontal = 0
c Flip vertical = 0
c Cache compression = FALSE
m Total width =  16000
m Total height = 32000
m Block width =  500
m Block height = 500
m Total cells to be processed = 512000000
t Preparer time = 0.0128314 s
c Input data type = Float32
m Jobs created = 2048
p Jobs remaining = 2047
p Jobs remaining = 2046
p Jobs remaining = 2045
p Jobs remaining = 2044
p Jobs remaining = 2043
p Jobs remaining = 2042
p Jobs remaining = 2041
p Jobs remaining = 2040
p Jobs remaining = 2039
p Jobs remaining = 2038
p Jobs remaining = 2037
p Jobs remaining = 2036
p Jobs remaining = 2035
p Jobs remaining = 2034
p Jobs remaining = 2033
p Jobs remaining = 2032
p Jobs remaining = 2031
p Jobs remaining = 2030
p Jobs remaining = 2029
p Jobs remaining = 2028
p Jobs remaining = 2027
p Jobs remaining = 2026
p Jobs remaining = 2025
p Jobs remaining = 2024
p Jobs remaining = 2023
p Jobs remaining = 2022
p Jobs remaining = 2021
p Jobs remaining = 2020
p Jobs remaining = 2019
p Jobs remaining = 2018
p Jobs remaining = 2017
p Jobs remaining = 2016
p Jobs remaining = 2015
p Jobs remaining = 2014
p Jobs remaining = 2013
p Jobs remaining = 2012
p Jobs remaining = 2011
p Jobs remaining = 2010
p Jobs remaining = 2009
p Jobs remaining = 2008
p Jobs remaining = 2007
p Jobs remaining = 2006
p Jobs remaining = 2005
p Jobs remaining = 2004
p Jobs remaining = 2003
p Jobs remaining = 2002
p Jobs remaining = 2001
p Jobs remaining = 2000
p Jobs remaining = 1999
p Jobs remaining = 1998
parallel_d8_accum.exe: ../../include/richdem/common/Array2D.hpp:552: richdem::Array2D< <template-parameter-1-1> >::i_t richdem::Array2D< <template-parameter-1-1> >::getN(richdem::Array2D< <template-parameter-1-1> >::i_t, uint8_t) const [with T = unsigned char; richdem::Array2D< <template-parameter-1-1> >::i_t = unsigned int; uint8_t = unsigned char]: Assertion `0<=n && n<=8' failed.
[gib:28256] *** Process received signal ***
[gib:28256] Signal: Aborted (6)
[gib:28256] Signal code:  (-6)
[gib:28256] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f2e4a729890]
[gib:28256] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f2e4a364e97]
[gib:28256] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f2e4a366801]
[gib:28256] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x3039a)[0x7f2e4a35639a]
[gib:28256] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30412)[0x7f2e4a356412]
[gib:28256] [ 5] ./parallel_d8_accum.exe(+0x2c96d)[0x55dac951296d]
[gib:28256] [ 6] ./parallel_d8_accum.exe(+0x30b62)[0x55dac9516b62]
[gib:28256] [ 7] ./parallel_d8_accum.exe(+0x35f40)[0x55dac951bf40]
[gib:28256] [ 8] ./parallel_d8_accum.exe(+0x406a1)[0x55dac95266a1]
[gib:28256] [ 9] ./parallel_d8_accum.exe(+0x10391)[0x55dac94f6391]
[gib:28256] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f2e4a347b97]
[gib:28256] [11] ./parallel_d8_accum.exe(+0x10d0a)[0x55dac94f6d0a]
[gib:28256] *** End of error message ***
p Jobs remaining = 1997
p Jobs remaining = 1996
p Jobs remaining = 1995
parallel_d8_accum.exe: ../../include/richdem/common/Array2D.hpp:552: richdem::Array2D< <template-parameter-1-1> >::i_t richdem::Array2D< <template-parameter-1-1> >::getN(richdem::Array2D< <template-parameter-1-1> >::i_t, uint8_t) const [with T = unsigned char; richdem::Array2D< <template-parameter-1-1> >::i_t = unsigned int; uint8_t = unsigned char]: Assertion `0<=n && n<=8' failed.
[gib:28257] *** Process received signal ***
[gib:28257] Signal: Aborted (6)
[gib:28257] Signal code:  (-6)
[gib:28257] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f3e32a85890]
[gib:28257] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f3e326c0e97]
[gib:28257] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f3e326c2801]
[gib:28257] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x3039a)[0x7f3e326b239a]
[gib:28257] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30412)[0x7f3e326b2412]
[gib:28257] [ 5] ./parallel_d8_accum.exe(+0x2c96d)[0x55ab9b04a96d]
[gib:28257] [ 6] ./parallel_d8_accum.exe(+0x30b62)[0x55ab9b04eb62]
[gib:28257] [ 7] ./parallel_d8_accum.exe(+0x35f40)[0x55ab9b053f40]
[gib:28257] [ 8] ./parallel_d8_accum.exe(+0x406a1)[0x55ab9b05e6a1]
[gib:28257] [ 9] ./parallel_d8_accum.exe(+0x10391)[0x55ab9b02e391]
[gib:28257] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f3e326a3b97]
[gib:28257] [11] ./parallel_d8_accum.exe(+0x10d0a)[0x55ab9b02ed0a]
[gib:28257] *** End of error message ***
p Jobs remaining = 1994
p Jobs remaining = 1993
p Jobs remaining = 1992
p Jobs remaining = 1991
p Jobs remaining = 1990
p Jobs remaining = 1989
p Jobs remaining = 1988
parallel_d8_accum.exe: ../../include/richdem/common/Array2D.hpp:552: richdem::Array2D< <template-parameter-1-1> >::i_t richdem::Array2D< <template-parameter-1-1> >::getN(richdem::Array2D< <template-parameter-1-1> >::i_t, uint8_t) const [with T = unsigned char; richdem::Array2D< <template-parameter-1-1> >::i_t = unsigned int; uint8_t = unsigned char]: Assertion `0<=n && n<=8' failed.
p Jobs remaining = 1987
p Jobs remaining = 1986
p Jobs remaining = 1985
[gib:28255] *** Process received signal ***
[gib:28255] Signal: Aborted (6)
[gib:28255] Signal code:  (-6)
[gib:28255] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7fb16bf4e890]
[gib:28255] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7fb16bb89e97]
[gib:28255] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7fb16bb8b801]
[gib:28255] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x3039a)[0x7fb16bb7b39a]
[gib:28255] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x30412)[0x7fb16bb7b412]
[gib:28255] [ 5] ./parallel_d8_accum.exe(+0x2c96d)[0x561d86f2b96d]
[gib:28255] [ 6] ./parallel_d8_accum.exe(+0x30b62)[0x561d86f2fb62]
[gib:28255] [ 7] ./parallel_d8_accum.exe(+0x35f40)[0x561d86f34f40]
[gib:28255] [ 8] ./parallel_d8_accum.exe(+0x406a1)[0x561d86f3f6a1]
[gib:28255] [ 9] ./parallel_d8_accum.exe(+0x10391)[0x561d86f0f391]
[gib:28255] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fb16bb6cb97]
[gib:28255] [11] ./parallel_d8_accum.exe(+0x10d0a)[0x561d86f0fd0a]
[gib:28255] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 0 on node gib exited on signal 6 (Aborted).

I've tried digging around the code, but I've never worked with c++ so I'm a bit lost.

johnniehard commented 5 years ago

@r-barnes here's the DEM if you have the time to have a look: https://drive.google.com/open?id=1-NarHegMQiH6WmhfjR_4NlvB6tuEEsIb

johnniehard commented 5 years ago

@r-barnes just want to let you know that we're working on this and making progress. It mostly turned out to be mismatches between code and documentation. PR coming up.