Closed yqin123 closed 7 months ago
Thank you, Yiren - confirming receipt. I will take a look! I could probably figure this out by diffing the source code, but is the source in the archive elmfire_public-UMD_SPOTTING_MODEL.zip
a local fork / test version that you have created yourself, i.e. it's not in the public repo? Thanks!
Hi Chris, Thanks for your help! elmfire_public-UMD_SPOTTING_MODEL.zip
is a local branch but is modified based on the most recent public version (my fork was updated days ago).
Thank you, Yiren. A diagnostic array was being allocated only when RANDOM_IGNITIONS=.TRUE. but it should be allocated for all cases. Commit https://github.com/lautenberger/elmfire/commit/7aeea6a7ec528a87a77581145095334d2e3bb2b2 resolves this issue. Basically, in elmfire.f90
delete line 270 and at line 271 add this line:
IF (MODE .NE. 2 .AND. NUM_MONTE_CARLO_VARIABLES .GT. 0) ALLOCATE(COEFFS_UNSCALED_BY_CASE(1:NUM_CASES_TOTAL,1:NUM_MONTE_CARLO_VARIABLES))
I think this will resolve your issue but if not let me know. Thanks for the report!
Hi Chris, Thank you very much! I modified the lines you mentioned, and now it works with elmfire elmfire.data
, however, still shows errors when I am running elmfire_debug elmfire.data
. This problem can be reproduced using the updated version, even without activating spotting on my end.
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x7fd2ffcfb6a0 in ???
#1 0x7fd2ffcfa8d5 in ???
#2 0x7fd2fef61b4f in ???
#3 0x7fd2fe43bf70 in ???
#4 0x7fd2fe43f8a9 in ???
#5 0x7fd2fe44040e in ???
#6 0x7fd2fe4409ec in ???
#7 0x7fd2fe440be3 in ???
#8 0x7fd2fe457386 in ???
#9 0x7fd2fe43fafa in ???
#10 0x7fd2fe4409d1 in ???
#11 0x7fd2fe440be3 in ???
#12 0x7fd2fe451aae in ???
#13 0x7fd2fe451de6 in ???
#14 0x7fd2fe451e8a in ???
#15 0x7fd2fe45abb4 in ???
#16 0x7fd2fe45c17b in ???
#17 0x7fd2fe43fafa in ???
#18 0x7fd2fe4409d1 in ???
#19 0x7fd2fe440be3 in ???
#20 0x7fd2fe451aae in ???
#21 0x7fd2fe451de6 in ???
#22 0x7fd2fe451e8a in ???
#23 0x7fd2fe4522d0 in ???
#24 0x7fd2fe43fafa in ???
#25 0x7fd2fe4409d1 in ???
#26 0x7fd2fe440be3 in ???
#27 0x7fd2fe4417c6 in ???
#28 0x7fd2fe431097 in ???
#29 0x7fd2fe4312cd in ???
#30 0x7fd2fe4978ec in ???
#31 0x7fd2fe41afd4 in ???
#32 0x7fd2fe41b773 in ???
#33 0x7fd2fe41bbed in ???
#34 0x7fd30037f5c2 in ???
#35 0x7fd30037f7be in ???
#36 0x7fd3003cc23f in ???
#37 0x7fd300220f9e in ???
#38 0x7fd3006d04b7 in ???
#39 0x7fd300b16f40 in ???
#40 0x4c6eb0 in elmfire
at ../../source/elmfire.f90:58
#41 0x4d988f in main
at ../../source/elmfire.f90:5
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 1409338 RUNNING AT login-2.zaratan.umd.edu
= KILLED BY SIGNAL: 8 (Floating point exception)
===================================================================================
FYI, here is a part of the ~/.bashrc file, showing a list of the environment variables relevant to ELMFIRE:
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific environment
if ! [[ "$PATH" =~ "$HOME/.local/bin:$HOME/bin:" ]]
then
PATH="$HOME/.local/bin:$HOME/bin:$PATH"
fi
export PATH
module load openmpi/4.1.1/gcc/
module load cuda/gcc/9.4.0/
module load curl/gcc/9.4.0/
module load gdal/3.4.0/gcc/
module load python
export PATH=$PATH:/home/yqin123/software/bin/elmfire
# Enviroment variables for ELMFIRE
# export GDALROOT=/cvmfs/hpcsw.umd.edu/spack-software/2022.06.15/linux-rhel8-zen2/gcc-9.4.0/gdal-3.4.0-2lfwxvv4yfehlamhc5zraivmeu4hbxdi/bin/
export GDALROOT=/home/yqin123/software/bin/gdal/bin
export BUILD_ELMFIRE_COMPANION_PROGRAMS=yes
export BUILD_ELMFIRE_DEBUG_BINARIES=yes
export ELMFIRE_FCOMPL_SERIAL_INTEL=ifort
export ELMFIRE_FCOMPL_MPI_INTEL=mpifort
export ELMFIRE_FCOMPL_SERIAL_GNU=gfortran
export ELMFIRE_FCOMPL_MPI_GNU=mpifort
export ELMFIRE_INSTALL_DIRECTORY=/home/yqin123/software/bin/elmfire
export ELMFIRE_SCRATCH_BASE=/home/yqin123/scratch/elmfire_scratch
export ELMFIRE_BASE_DIR=/home/yqin123/software/elmfire_public/elmfire_public/
export ELMFIRE_INSTALL_DIR=/home/yqin123/software/bin/elmfire_public/
export CLOUDFIRE_SERVER=172.92.17.198
export PATH=$PATH:$ELMFIRE_INSTALL_DIR:$ELMFIRE_BASE_DIR/cloudfire
export GDAL_PAM_ENABLED=YES
Could there be some problem related to the compilation? Let me know if you need other information. Thanks for your help again!
Hi Chris,
It has proved to be a compilation problem on the server here at UMD. I tested on another Linux machine, both commands work well now. I made a pull request later with some modifications and included detailed descriptions there.
I will leave this issue open and see whether the compilation problem can be resolved.
Yiren
Hi Chris, Thank you very much! I modified the lines you mentioned, and now it works with
elmfire elmfire.data
, however, still shows errors when I am runningelmfire_debug elmfire.data
. This problem can be reproduced using the updated version, even without activating spotting on my end.Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation. Backtrace for this error: #0 0x7fd2ffcfb6a0 in ??? #1 0x7fd2ffcfa8d5 in ??? #2 0x7fd2fef61b4f in ??? #3 0x7fd2fe43bf70 in ??? #4 0x7fd2fe43f8a9 in ??? #5 0x7fd2fe44040e in ??? #6 0x7fd2fe4409ec in ??? #7 0x7fd2fe440be3 in ??? #8 0x7fd2fe457386 in ??? #9 0x7fd2fe43fafa in ??? #10 0x7fd2fe4409d1 in ??? #11 0x7fd2fe440be3 in ??? #12 0x7fd2fe451aae in ??? #13 0x7fd2fe451de6 in ??? #14 0x7fd2fe451e8a in ??? #15 0x7fd2fe45abb4 in ??? #16 0x7fd2fe45c17b in ??? #17 0x7fd2fe43fafa in ??? #18 0x7fd2fe4409d1 in ??? #19 0x7fd2fe440be3 in ??? #20 0x7fd2fe451aae in ??? #21 0x7fd2fe451de6 in ??? #22 0x7fd2fe451e8a in ??? #23 0x7fd2fe4522d0 in ??? #24 0x7fd2fe43fafa in ??? #25 0x7fd2fe4409d1 in ??? #26 0x7fd2fe440be3 in ??? #27 0x7fd2fe4417c6 in ??? #28 0x7fd2fe431097 in ??? #29 0x7fd2fe4312cd in ??? #30 0x7fd2fe4978ec in ??? #31 0x7fd2fe41afd4 in ??? #32 0x7fd2fe41b773 in ??? #33 0x7fd2fe41bbed in ??? #34 0x7fd30037f5c2 in ??? #35 0x7fd30037f7be in ??? #36 0x7fd3003cc23f in ??? #37 0x7fd300220f9e in ??? #38 0x7fd3006d04b7 in ??? #39 0x7fd300b16f40 in ??? #40 0x4c6eb0 in elmfire at ../../source/elmfire.f90:58 #41 0x4d988f in main at ../../source/elmfire.f90:5 =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 0 PID 1409338 RUNNING AT login-2.zaratan.umd.edu = KILLED BY SIGNAL: 8 (Floating point exception) ===================================================================================
FYI, here is a part of the ~/.bashrc file, showing a list of the environment variables relevant to ELMFIRE:
# .bashrc # Source global definitions if [ -f /etc/bashrc ]; then . /etc/bashrc fi # User specific environment if ! [[ "$PATH" =~ "$HOME/.local/bin:$HOME/bin:" ]] then PATH="$HOME/.local/bin:$HOME/bin:$PATH" fi export PATH module load openmpi/4.1.1/gcc/ module load cuda/gcc/9.4.0/ module load curl/gcc/9.4.0/ module load gdal/3.4.0/gcc/ module load python export PATH=$PATH:/home/yqin123/software/bin/elmfire # Enviroment variables for ELMFIRE # export GDALROOT=/cvmfs/hpcsw.umd.edu/spack-software/2022.06.15/linux-rhel8-zen2/gcc-9.4.0/gdal-3.4.0-2lfwxvv4yfehlamhc5zraivmeu4hbxdi/bin/ export GDALROOT=/home/yqin123/software/bin/gdal/bin export BUILD_ELMFIRE_COMPANION_PROGRAMS=yes export BUILD_ELMFIRE_DEBUG_BINARIES=yes export ELMFIRE_FCOMPL_SERIAL_INTEL=ifort export ELMFIRE_FCOMPL_MPI_INTEL=mpifort export ELMFIRE_FCOMPL_SERIAL_GNU=gfortran export ELMFIRE_FCOMPL_MPI_GNU=mpifort export ELMFIRE_INSTALL_DIRECTORY=/home/yqin123/software/bin/elmfire export ELMFIRE_SCRATCH_BASE=/home/yqin123/scratch/elmfire_scratch export ELMFIRE_BASE_DIR=/home/yqin123/software/elmfire_public/elmfire_public/ export ELMFIRE_INSTALL_DIR=/home/yqin123/software/bin/elmfire_public/ export CLOUDFIRE_SERVER=172.92.17.198 export PATH=$PATH:$ELMFIRE_INSTALL_DIR:$ELMFIRE_BASE_DIR/cloudfire export GDAL_PAM_ENABLED=YES
Could there be some problem related to the compilation? Let me know if you need other information. Thanks for your help again!
Hmmm, that is quite strange. Could you try running ELMFIRE's debug executable to see if it gives a line number where the erroneous arithmetic operation is occurring?
I recently tried to run the debug executable without an input file, it showed the error occurred at lines 58 and 5 in file elmfire.f90:
#40 0x4c6eb0 in elmfire
at ../../source/elmfire.f90:58
#41 0x4d988f in main
at ../../source/elmfire.f90:5
Hi @yqin123 - just confirming that this issue has been resolved with the change in compiler flags that we figure out earlier this week?
Yes, Thank you very much! @lautenberger
Hi Chris,
I updated the spotting model on my end and tried incorporating the HRR_transient into it.
However, when I run a test case with the Eulerian spotting model activated (using the command
elmfire elmfire.data
), it shows "segmentation fault ..." while outputting results. The results are correct regardless.Meanwhile, when I run the case using the command
elmfire_debug elmfire.data
, it informs me something went wrong with lines 5 and 58 in "elmfire.f90" (suggesting: "Floating-point exception - erroneous arithmetic operation."), without outputting any results.This problem only occurs when I run with my spotting model.
Could you please help me check it out or give me some suggestions on finding out what is wrong? I have attached the test version of ELMFIRE and my test input files here. Please let me know if you need any further clarification. elmfire_public-UMD_SPOTTING_MODEL.zip spotting-test-inputs.zip
Thanks, Yiren