lautenberger / elmfire

Eulerian Level set Model of FIRE spread
https://elmfire.io
Eclipse Public License 2.0
23 stars 11 forks source link

Error occured when run with spotting model #54

Closed yqin123 closed 3 months ago

yqin123 commented 5 months ago

Hi Chris,

I updated the spotting model on my end and tried incorporating the HRR_transient into it.

However, when I run a test case with the Eulerian spotting model activated (using the command elmfire elmfire.data), it shows "segmentation fault ..." while outputting results. The results are correct regardless.

Meanwhile, when I run the case using the command elmfire_debug elmfire.data, it informs me something went wrong with lines 5 and 58 in "elmfire.f90" (suggesting: "Floating-point exception - erroneous arithmetic operation."), without outputting any results.

This problem only occurs when I run with my spotting model.

Could you please help me check it out or give me some suggestions on finding out what is wrong? I have attached the test version of ELMFIRE and my test input files here. Please let me know if you need any further clarification. elmfire_public-UMD_SPOTTING_MODEL.zip spotting-test-inputs.zip

Thanks, Yiren

lautenberger commented 5 months ago

Thank you, Yiren - confirming receipt. I will take a look! I could probably figure this out by diffing the source code, but is the source in the archive elmfire_public-UMD_SPOTTING_MODEL.zip a local fork / test version that you have created yourself, i.e. it's not in the public repo? Thanks!

yqin123 commented 5 months ago

Hi Chris, Thanks for your help! elmfire_public-UMD_SPOTTING_MODEL.zip is a local branch but is modified based on the most recent public version (my fork was updated days ago).

lautenberger commented 5 months ago

Thank you, Yiren. A diagnostic array was being allocated only when RANDOM_IGNITIONS=.TRUE. but it should be allocated for all cases. Commit https://github.com/lautenberger/elmfire/commit/7aeea6a7ec528a87a77581145095334d2e3bb2b2 resolves this issue. Basically, in elmfire.f90 delete line 270 and at line 271 add this line:

IF (MODE .NE. 2 .AND. NUM_MONTE_CARLO_VARIABLES .GT. 0) ALLOCATE(COEFFS_UNSCALED_BY_CASE(1:NUM_CASES_TOTAL,1:NUM_MONTE_CARLO_VARIABLES))

I think this will resolve your issue but if not let me know. Thanks for the report!

yqin123 commented 5 months ago

Hi Chris, Thank you very much! I modified the lines you mentioned, and now it works with elmfire elmfire.data, however, still shows errors when I am running elmfire_debug elmfire.data. This problem can be reproduced using the updated version, even without activating spotting on my end.

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fd2ffcfb6a0 in ???
#1  0x7fd2ffcfa8d5 in ???
#2  0x7fd2fef61b4f in ???
#3  0x7fd2fe43bf70 in ???
#4  0x7fd2fe43f8a9 in ???
#5  0x7fd2fe44040e in ???
#6  0x7fd2fe4409ec in ???
#7  0x7fd2fe440be3 in ???
#8  0x7fd2fe457386 in ???
#9  0x7fd2fe43fafa in ???
#10  0x7fd2fe4409d1 in ???
#11  0x7fd2fe440be3 in ???
#12  0x7fd2fe451aae in ???
#13  0x7fd2fe451de6 in ???
#14  0x7fd2fe451e8a in ???
#15  0x7fd2fe45abb4 in ???
#16  0x7fd2fe45c17b in ???
#17  0x7fd2fe43fafa in ???
#18  0x7fd2fe4409d1 in ???
#19  0x7fd2fe440be3 in ???
#20  0x7fd2fe451aae in ???
#21  0x7fd2fe451de6 in ???
#22  0x7fd2fe451e8a in ???
#23  0x7fd2fe4522d0 in ???
#24  0x7fd2fe43fafa in ???
#25  0x7fd2fe4409d1 in ???
#26  0x7fd2fe440be3 in ???
#27  0x7fd2fe4417c6 in ???
#28  0x7fd2fe431097 in ???
#29  0x7fd2fe4312cd in ???
#30  0x7fd2fe4978ec in ???
#31  0x7fd2fe41afd4 in ???
#32  0x7fd2fe41b773 in ???
#33  0x7fd2fe41bbed in ???
#34  0x7fd30037f5c2 in ???
#35  0x7fd30037f7be in ???
#36  0x7fd3003cc23f in ???
#37  0x7fd300220f9e in ???
#38  0x7fd3006d04b7 in ???
#39  0x7fd300b16f40 in ???
#40  0x4c6eb0 in elmfire
    at ../../source/elmfire.f90:58
#41  0x4d988f in main
    at ../../source/elmfire.f90:5

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 1409338 RUNNING AT login-2.zaratan.umd.edu
=   KILLED BY SIGNAL: 8 (Floating point exception)
===================================================================================

FYI, here is a part of the ~/.bashrc file, showing a list of the environment variables relevant to ELMFIRE:

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

# User specific environment
if ! [[ "$PATH" =~ "$HOME/.local/bin:$HOME/bin:" ]]
then
    PATH="$HOME/.local/bin:$HOME/bin:$PATH"
fi
export PATH

module load openmpi/4.1.1/gcc/
module load cuda/gcc/9.4.0/
module load curl/gcc/9.4.0/
module load gdal/3.4.0/gcc/
module load python

export PATH=$PATH:/home/yqin123/software/bin/elmfire

# Enviroment variables for ELMFIRE 
# export GDALROOT=/cvmfs/hpcsw.umd.edu/spack-software/2022.06.15/linux-rhel8-zen2/gcc-9.4.0/gdal-3.4.0-2lfwxvv4yfehlamhc5zraivmeu4hbxdi/bin/
export GDALROOT=/home/yqin123/software/bin/gdal/bin
export BUILD_ELMFIRE_COMPANION_PROGRAMS=yes
export BUILD_ELMFIRE_DEBUG_BINARIES=yes
export ELMFIRE_FCOMPL_SERIAL_INTEL=ifort
export ELMFIRE_FCOMPL_MPI_INTEL=mpifort

export ELMFIRE_FCOMPL_SERIAL_GNU=gfortran
export ELMFIRE_FCOMPL_MPI_GNU=mpifort
export ELMFIRE_INSTALL_DIRECTORY=/home/yqin123/software/bin/elmfire

export ELMFIRE_SCRATCH_BASE=/home/yqin123/scratch/elmfire_scratch
export ELMFIRE_BASE_DIR=/home/yqin123/software/elmfire_public/elmfire_public/
export ELMFIRE_INSTALL_DIR=/home/yqin123/software/bin/elmfire_public/
export CLOUDFIRE_SERVER=172.92.17.198
export PATH=$PATH:$ELMFIRE_INSTALL_DIR:$ELMFIRE_BASE_DIR/cloudfire
export GDAL_PAM_ENABLED=YES

Could there be some problem related to the compilation? Let me know if you need other information. Thanks for your help again!

yqin123 commented 4 months ago

Hi Chris,

It has proved to be a compilation problem on the server here at UMD. I tested on another Linux machine, both commands work well now. I made a pull request later with some modifications and included detailed descriptions there.

I will leave this issue open and see whether the compilation problem can be resolved.

Yiren

Hi Chris, Thank you very much! I modified the lines you mentioned, and now it works with elmfire elmfire.data, however, still shows errors when I am running elmfire_debug elmfire.data. This problem can be reproduced using the updated version, even without activating spotting on my end.

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fd2ffcfb6a0 in ???
#1  0x7fd2ffcfa8d5 in ???
#2  0x7fd2fef61b4f in ???
#3  0x7fd2fe43bf70 in ???
#4  0x7fd2fe43f8a9 in ???
#5  0x7fd2fe44040e in ???
#6  0x7fd2fe4409ec in ???
#7  0x7fd2fe440be3 in ???
#8  0x7fd2fe457386 in ???
#9  0x7fd2fe43fafa in ???
#10  0x7fd2fe4409d1 in ???
#11  0x7fd2fe440be3 in ???
#12  0x7fd2fe451aae in ???
#13  0x7fd2fe451de6 in ???
#14  0x7fd2fe451e8a in ???
#15  0x7fd2fe45abb4 in ???
#16  0x7fd2fe45c17b in ???
#17  0x7fd2fe43fafa in ???
#18  0x7fd2fe4409d1 in ???
#19  0x7fd2fe440be3 in ???
#20  0x7fd2fe451aae in ???
#21  0x7fd2fe451de6 in ???
#22  0x7fd2fe451e8a in ???
#23  0x7fd2fe4522d0 in ???
#24  0x7fd2fe43fafa in ???
#25  0x7fd2fe4409d1 in ???
#26  0x7fd2fe440be3 in ???
#27  0x7fd2fe4417c6 in ???
#28  0x7fd2fe431097 in ???
#29  0x7fd2fe4312cd in ???
#30  0x7fd2fe4978ec in ???
#31  0x7fd2fe41afd4 in ???
#32  0x7fd2fe41b773 in ???
#33  0x7fd2fe41bbed in ???
#34  0x7fd30037f5c2 in ???
#35  0x7fd30037f7be in ???
#36  0x7fd3003cc23f in ???
#37  0x7fd300220f9e in ???
#38  0x7fd3006d04b7 in ???
#39  0x7fd300b16f40 in ???
#40  0x4c6eb0 in elmfire
  at ../../source/elmfire.f90:58
#41  0x4d988f in main
  at ../../source/elmfire.f90:5

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 1409338 RUNNING AT login-2.zaratan.umd.edu
=   KILLED BY SIGNAL: 8 (Floating point exception)
===================================================================================

FYI, here is a part of the ~/.bashrc file, showing a list of the environment variables relevant to ELMFIRE:

# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

# User specific environment
if ! [[ "$PATH" =~ "$HOME/.local/bin:$HOME/bin:" ]]
then
    PATH="$HOME/.local/bin:$HOME/bin:$PATH"
fi
export PATH

module load openmpi/4.1.1/gcc/
module load cuda/gcc/9.4.0/
module load curl/gcc/9.4.0/
module load gdal/3.4.0/gcc/
module load python

export PATH=$PATH:/home/yqin123/software/bin/elmfire

# Enviroment variables for ELMFIRE 
# export GDALROOT=/cvmfs/hpcsw.umd.edu/spack-software/2022.06.15/linux-rhel8-zen2/gcc-9.4.0/gdal-3.4.0-2lfwxvv4yfehlamhc5zraivmeu4hbxdi/bin/
export GDALROOT=/home/yqin123/software/bin/gdal/bin
export BUILD_ELMFIRE_COMPANION_PROGRAMS=yes
export BUILD_ELMFIRE_DEBUG_BINARIES=yes
export ELMFIRE_FCOMPL_SERIAL_INTEL=ifort
export ELMFIRE_FCOMPL_MPI_INTEL=mpifort

export ELMFIRE_FCOMPL_SERIAL_GNU=gfortran
export ELMFIRE_FCOMPL_MPI_GNU=mpifort
export ELMFIRE_INSTALL_DIRECTORY=/home/yqin123/software/bin/elmfire

export ELMFIRE_SCRATCH_BASE=/home/yqin123/scratch/elmfire_scratch
export ELMFIRE_BASE_DIR=/home/yqin123/software/elmfire_public/elmfire_public/
export ELMFIRE_INSTALL_DIR=/home/yqin123/software/bin/elmfire_public/
export CLOUDFIRE_SERVER=172.92.17.198
export PATH=$PATH:$ELMFIRE_INSTALL_DIR:$ELMFIRE_BASE_DIR/cloudfire
export GDAL_PAM_ENABLED=YES

Could there be some problem related to the compilation? Let me know if you need other information. Thanks for your help again!

lautenberger commented 4 months ago

Hmmm, that is quite strange. Could you try running ELMFIRE's debug executable to see if it gives a line number where the erroneous arithmetic operation is occurring?

yqin123 commented 4 months ago

I recently tried to run the debug executable without an input file, it showed the error occurred at lines 58 and 5 in file elmfire.f90:

#40  0x4c6eb0 in elmfire
    at ../../source/elmfire.f90:58
#41  0x4d988f in main
    at ../../source/elmfire.f90:5
lautenberger commented 3 months ago

Hi @yqin123 - just confirming that this issue has been resolved with the change in compiler flags that we figure out earlier this week?

yqin123 commented 3 months ago

Yes, Thank you very much! @lautenberger