Closed mkavulich closed 3 years ago
Did you see my google chat message yesterday? I had to fix two such errors, illegal instruction, with the same underlying cause, in the past - one in EMC_post, and another in ccpp-physics (sfcsub.F).
@climbfuji Thanks for the reply; I did see your suggestion yesterday and the link (https://github.com/NOAA-EMC/EMC_post/pull/81/files), but I didn't see any character
definitions in the beginning of that subroutine. The error must be occurring before the print statement, so it has to be one of these lines unless I'm completely off-base:
1 GLAT,IM,JM,IMN,JMN,lon_c,lat_c)
implicit none
real, parameter :: D2R = 3.14159265358979/180.
integer, parameter :: MAXSUM=20000000
real hgt_1d(MAXSUM)
integer IM, JM, IMN, JMN
real GLAT(JMN), GLON(IMN)
INTEGER ZAVG(IMN,JMN),ZSLM(IMN,JMN)
real land_frac(IM,JM)
real ORO(IM,JM),SLM(IM,JM),VAR(IM,JM),VAR4(IM,JM)
integer IST,IEN,JST, JEN
real lon_c(IM+1,JM+1), lat_c(IM+1,JM+1)
INTEGER mskocn,isave
LOGICAL FLAG, DEBUG
real LONO(4),LATO(4),LONI,LATI
real HEIGHT
integer JM1,i,j,nsum,ii,jj,i1,numx,i2
integer ilist(IMN)
real DELXN,XNSUM,XLAND,XWATR,XL1,XS1,XW1,XW2,XW4
!jaa
real :: xnsum_j,xland_j,xwatr_j
logical inside_a_polygon
As far as I can tell all of that is pretty standard (if very messy) fortran variable declarations. The only thing I noticed weird about the beginning of that subroutine was the use of "1" as a continuation character, but I changed it to "&" and still got the same error. (And from further reading apparently that is perfectly fine by fortran77 standards).
I'm curious if you have more info about the exact conditions that caused the errors you fixed; from what I can tell it's related to setting variables as allocatable by using "*"...is that correct?
@climbfuji Thanks for the reply; I did see your suggestion yesterday and the link (https://github.com/NOAA-EMC/EMC_post/pull/81/files), but I didn't see any
character
definitions in the beginning of that subroutine. The error must be occurring before the print statement, so it has to be one of these lines unless I'm completely off-base:1 GLAT,IM,JM,IMN,JMN,lon_c,lat_c) implicit none real, parameter :: D2R = 3.14159265358979/180. integer, parameter :: MAXSUM=20000000 real hgt_1d(MAXSUM) integer IM, JM, IMN, JMN real GLAT(JMN), GLON(IMN) INTEGER ZAVG(IMN,JMN),ZSLM(IMN,JMN) real land_frac(IM,JM) real ORO(IM,JM),SLM(IM,JM),VAR(IM,JM),VAR4(IM,JM) integer IST,IEN,JST, JEN real lon_c(IM+1,JM+1), lat_c(IM+1,JM+1) INTEGER mskocn,isave LOGICAL FLAG, DEBUG real LONO(4),LATO(4),LONI,LATI real HEIGHT integer JM1,i,j,nsum,ii,jj,i1,numx,i2 integer ilist(IMN) real DELXN,XNSUM,XLAND,XWATR,XL1,XS1,XW1,XW2,XW4 !jaa real :: xnsum_j,xland_j,xwatr_j logical inside_a_polygon
As far as I can tell all of that is pretty standard (if very messy) fortran variable declarations. The only thing I noticed weird about the beginning of that subroutine was the use of "1" as a continuation character, but I changed it to "&" and still got the same error. (And from further reading apparently that is perfectly fine by fortran77 standards).
I'm curious if you have more info about the exact conditions that caused the errors you fixed; from what I can tell it's related to setting variables as allocatable by using "*"...is that correct?
I guess it has to do with explicit dimensions instead of allocating those arrays.
Questions: Do you use ulimit -S -s unlimited
on your mac? That bumps that stack up from 8MB to 65MB (maximum allowed)?
If I had to guess I'd say it is the line
real hgt_1d(MAXSUM)
but you could try to make all these array definitions allocatable arrays, and if this fixes the problem revert one by one until it breaks again.
@climbfuji Great guess, changing that one line seems to have fixed the issue! I'll open a PR once I am sure there are no other areas that need fixing in UFS_UTILS.
@climbfuji Great guess, changing that one line seems to have fixed the issue! I'll open a PR once I am sure there are no other areas that need fixing in UFS_UTILS.
Hah! Thanks for trying. Since the original code is valid Fortran, you may want to use CPP directives to use the allocate syntax only for macOS - but best to discuss with @GeorgeGayno-NOAA .
@kgerheiser can you add a macos build to the github actions? That will be the beginning of testing this...
I have been trying to debug this problem for a while but have had no success. Running orog on MacOS Catalina (10.15.7) compiled with gfortran 9.3.0 (GNU Fortran (MacPorts gcc9 9.3.0_4) 9.3.0) fails with the message
Illegal instruction: 4
. This issue occurs with or without DEBUG settings turned on (there seems to be no difference in behavior regardless of compilation flags).This is the end of the output leading up to failure:
Based on the output, the failure appears to be in the calling or initialization of the MAKEMT2 subroutine, but somehow I can not tease any more debugging info: the executable does not print a traceback or any other information aside from "Illegal instruction: 4".
Here is the full log, everything looks normal until the sudden failure.
I'm at a loss as to what to try next, so I'm hoping someone else can offer some suggestions, or better yet try this themselves and try to reproduce the issue. Let me know if you need any more info.