hafs-community / HAFS

Hurricane Analysis and Forecast System
Other
29 stars 54 forks source link

code stability test failed in exec/hafs_gsi.x #248

Open BinLiu-NOAA opened 9 months ago

BinLiu-NOAA commented 9 months ago

Description

Provide a clear and concise description of the requested feature/capability. From NCO SPA: During the HAFS v1.0 code stability test, to recompile HAFS code with ‘-check all’ and ‘-ftrapuv’ flag, or using "-check bounds" only.

The code stability test of exec/gsi.x failed in either with ‘-check all’ or "-check bounds". Please investigate this failure and address it at the next upgrade.

Here are the detailed info - /lfs/h1/ops/test/packages/hafs.v1.0.3/sorc.chkall - builds with '-check all' /lfs/h1/ops/test/packages/hafs.v1.0.3/exec.chk.all - exec with '-check all'/lfs/h1/ops/test/packages/hafs.v1.0.3/sorc.chk.bounds - builds with '-check bounds' /lfs/h1/ops/test/packages/hafs.v1.0.3/exec.chk.bounds - exec with '-check bounds'

Failed with '-check all' - hfsb1_analysis_12_v1.0.1.o66742417 - forrtl: severe (408): fort: (8): Attempt to fetch from allocatable variable CLOUD when it is not allocated Image PC Routine Line Source hafs_gsi.x 00000000078380FF Unknown Unknown Unknown hafs_gsi.x 00000000057D3A73 updateguess 290 update_guess.f90 hafs_gsi.x 000000000466E5FF pcgsoimod_mp_pcgs 619 pcgsoi.f90 hafsgsi.x 0000000003CB6C95 glbsoi 371 glbsoi.f90 hafsgsi.x 0000000000E9D98C gsisub 200 gsisub.F90 hafs_gsi.x 000000000042B31C gsimod_mp_gsimain 2230 gsimod.F90 hafs_gsi.x 000000000041393B MAIN__ 631 gsimain.f90

Failed with '-check bounds' - /lfs/h1/ops/test/output/20230612/hfsa1_analysis_00_NHC_09L_IAN_2022092400.o62848309 forrtl: error (65): floating invalid Image PC Routine Line Source
hafs_gsi.x 000000000674808B Unknown Unknown Unknown libpthread-2.31.s 0000147535AB28C0 Unknown Unknown Unknown hafs_gsi.x 000000000189E6E1 read_radar_l2rw_n 3410 read_radar.f90 hafs_gsi.x 00000000016F47FD read_obsmod_mp_re 1601 read_obs.F90 hafs_gsi.x 00000000012B7193 observermod_mp_se 331 observer.F90 hafsgsi.x 0000000003473039 glbsoi 222 glbsoi.f90 hafsgsi.x 0000000000D278C8 gsisub 200 gsisub.F90 hafs_gsi.x 0000000000429FE7 gsimod_mp_gsimain 2230 gsimod.F90 hafs_gsi.x 0000000000413764 MAIN__ 631 gsimain.f90

Proposed solution

How should the new feature/capability be added? If you have thoughts on the implementation strategy, please share them here.

Status (optional)

Do you (or a colleague) plan to work on adding this feature?

Related to (optional)

Directly reference any related issues or PRs in this or other repositories, and describe how they are related. Examples:

BijuThomas-NOAA commented 8 months ago

Floating invalid error with the debug build on WCOSS2: clw_mod.f90: line# 987 pred_var_clw(1) = log(tb_use(7) - tb_use(8)) Basically, the quantity inside the log becomes negative.

From Xu,

This likely indicates the bias correction is somehow not working properly. I'm not familiar with the AMSR2 dataset, but it looks like the sys_bias for channel 7 is larger than channel 8 (L961 in clw_mod.f90), So it is possible when a similar value in these two channels gets negative differences after the bias correction. It will be helpful if you can output the tb(7) & tb(8) before the bias correction. If it is only an occasional situation due to certain data points, we could just skip the data points as they did around L996 of clw_mod.f90. However, if the majority of the dataset is having problems, we may need to contact the bias correction provider for their help.

BijuThomas-NOAA commented 8 months ago

An array bound error is noticed in setuprad.f90

forrtl: severe (408): fort: (2): Subscript #1 of the array CBIAS has value 252 which is greater than the upper bound of 250

Replacing maxscan=250 to maxscan=252 in radinfo.f90 fixed this issue.

BijuThomas-NOAA commented 7 months ago

A potential bug in "stpcalc.f90" is identified. Array bound error for the stp variable in stpcalc.f90. Xu Lu provided a fix that we are testing.

From Xu Lu:

This part is supposed to find the historical minimum outpen looping i from 1 to nsteptot, where nsteptot can be greater than istp_iter (the maximum possible is 3*istp_iter+1 depend on how many times it tried to search for new stepsize directions).
When it stored the minimum outpen stepsize at L846:
              stp(ii)=outstp(i)
It used ii, which is istp_iter according to the if statement at L840.
Then the istp_use at L848 should also be ii for consistency, but he used i instead, that's why it exceeds the array.
The if statement at L851 should use nsteptot instead of istp_iter as well.
BinLiu-NOAA commented 4 months ago

This issue has been resolved by this GSI commit.