Closed uturuncoglu closed 1 week ago
@pvelissariou1 @saeed-moghimi-noaa I think we need to get some help for ww3. At this point, new cap (also used by all the wave configurations under UFS Weather Model except one single test that uses NUOPC connectors) has some issue with atm2sch2wav. I could able to run atm2sch and atm2wav without any issue. The coupling through wav2sch is using radiation stresses and they were not available through the new cap but I activate it. At this point, the call that will calculate the radiation stresses returns all zero (call CalcRadstr2D( va, sxxn, sxyn, syyn)
) and when I checked va
(the input to the call) is all zero too. there could be some configuration issue in here that needs to be fixed.
@pvelissariou1 @saeed-moghimi-noaa It seems that S2S application is able to provide those fields without any issue. At this point we are thinking that there is some option in the ww3_grid.inp
that prevents to get non zero radiation stresses. Please see the discussion on https://github.com/NOAA-EMC/WW3/issues/1110. I think if we find that difference and fix the issue in the configuration, we could able to couple ww3 with schsim using new mesh cap.
Hi @aliabdolali @AliS-Noaa ,
Happy New Year!
I have an issue running WW3 on Hera and Hercules, for my Atlantic case, which is about ~5M nodes (subset from 120m ADCIRC mesh), I was using 800 cores for 8 hours to have 2-days' simulation results on Hera, so I have to use about 5000 cores to finish 1-month simulation within 8 hours.
Its timestep setting is as follows:
$ Set time steps ----------------------------------------------------- $
$ - Time step information (this information is always read)
$ maximum global time step, maximum CFL time step for x-y and
$ k-theta, minimum source term time step (all in seconds).
$
$
100. 100. 100. 100.
The test case is at /scratch2/STI/coastal/Yunfang.Sun/ww3_hera/ian_noobc_1.
Do you have any suggestions on how to increase the simulation speed?
Thank you very much!
Best,
Yunfang
Hi @yunfangsun
Do you know if you are doing explicit or implicit runs? Would you please paste the whole main inp file where you define the parameters?
Thanks
Hi Saeed,
I was using the implicit scheme, where scheme selections are as follows:
EXPFSN = F,
EXPFSPSI = F,
EXPFSFCT = F,
IMPFSN = F,
EXPTOTAL = F,
IMPTOTAL = T,
IMPREFRACTION = T,
IMPFREQSHIFT = T,
IMPSOURCE = T,
And the whole grd.inp is as follows:
$ -------------------------------------------------------------------- $
$ WAVEWATCH III Grid preprocessor input file $
$ -------------------------------------------------------------------- $
$ Grid name (C*30, in quotes)
$
'atlantic'
$
$ Frequency increment factor and first frequency (Hz) ---------------- $
$ number of frequencies (wavenumbers) and directions, relative offset
$ of first direction in terms of the directional increment [-0.5,0.5].
$ In versions 1.18 and 2.22 of the model this value was by definiton 0,
$ it is added to mitigate the GSE for a first order scheme. Note that
$ this factor is IGNORED in the print plots in ww3_outp.
$
1.10 0.05 32 36 0.
$
$ Set model flags ---------------------------------------------------- $
$ - FLDRY Dry run (input/output only, no calculation).
$ - FLCX, FLCY Activate X and Y component of propagation.
$ - FLCTH, FLCK Activate direction and wavenumber shifts.
$ - FLSOU Activate source terms.
$
F T T T T T
$
$ Set time steps ----------------------------------------------------- $
$ - Time step information (this information is always read)
$ maximum global time step, maximum CFL time step for x-y and
$ k-theta, minimum source term time step (all in seconds).
$
$
100. 100. 100. 100.
101. $ Start of namelist input section ------------------------------------ $
$ Starting with WAVEWATCH III version 2.00, the tunable parameters
$ for source terms, propagation schemes, and numerics are read using
$ namelists. Any namelist found in the folowing sections up to the
$ end-of-section identifier string (see below) is temporarily written
$ to ww3_grid.scratch, and read from there if necessary. Namelists
$ not needed for the given switch settings will be skipped
$ automatically, and the order of the namelists is immaterial.
$
$ This is TEST405
$
&SIN4 BETAMAX = 1.55, ZALP=0.006, ZWND = 5.,
Z0MAX = 0.0020, SINTHP=2.0, SWELLFPAR = 3, SWELLF = 0.80,
TAUWSHELTER = 0.0, SWELLF2=-0.018, SWELLF3= 0.015, Z0RAT = 0.04,
SWELLF4 = 100000, SWELLF5 = 1.2 /
$&SDS4 SDSBCHOICE = 1.0, SDSC2 = -0.2200E-04, SDSCUM = -0.40,
$ SDSC4 = 1.00, SDSC5 = 0.0000E+00, SDSC6 = 0.3000E+00,
$ WNMEANP =0.50, FXPM3 =4.00, FXFM3 = 2.5, FXFMAGE = 0.000,
$ SDSBINT = 0.3000E+00, SDSBCK = 0.0000E+00, SDSABK = 1.500, SDSPBK = 4.000,
$ SDSHCK = 1.50, SDSBR = 0.9000E-03, SDSSTRAIN = 0.0, SDSSTRAINA =15.0, SDSSTRAIN2 = 0.0,
$ SDSBT = 0.00, SDSP = 2.00, SDSISO = 2, SDSCOS =2.0, SDSDTH = 80.0,
$ SDSBRF1 = 0.50, SDSBRFDF = 0,
$ SDSBM0 = 1.00, SDSBM1 = 0.00, SDSBM2 = 0.00, SDSBM3 = 0.00, SDSBM4 = 0.00,
$ SPMSS = 0.50, SDKOF = 3.00, SDSMWD = 0.90, SDSFACMTF =400.0,
$ SDSMWPOW =1.5, SDSNMTF = 1.00, SDSCUMP =2.0, SDSNUW =.000E+00,
$ WHITECAPWIDTH = 0.30 WHITECAPDUR = 0.56 /
$
&OUTS E3D = 1, TH1MF = 1, STH1MF = 1 /
&UNST UGOBCAUTO = F,
UGOBCDEPTH= -10.,
EXPFSN = F,
EXPFSPSI = F,
EXPFSFCT = F,
IMPFSN = F,
EXPTOTAL = F,
IMPTOTAL = T,
IMPREFRACTION = T,
IMPFREQSHIFT = T,
IMPSOURCE = T,
SETUP_APPLY_WLV = F,
SOLVERTHR_SETUP=1E-14,
CRIT_DEP_SETUP=0.1,
JGS_USE_JACOBI = T,
JGS_BLOCK_GAUSS_SEIDEL = T,
JGS_TERMINATE_MAXITER = T,
JGS_MAXITER = 1000,
JGS_TERMINATE_NORM = F,
JGS_TERMINATE_DIFFERENCE = T,
JGS_DIFF_THR = 1.E-8,
JGS_PMIN = 3.0,
JGS_LIMITER = F,
JGS_NORM_THR = 1.E-20 /
$
$ Bottom friction - - - - - - - - - - - - - - - - - - - - - - - - - -
Happy new year NOS and NOAA team. First, why you picked 100s for your time step? What is the CFL based on minimum resolution? This is the key here. You need to calculate the resolution of your entire mesh based on physical distance (not in degrees), then calculate the group velocity based on minimum frequency and then time step can be chosen with CFL=5-10. VERY CLASSIC. Second, once an optimum time step is chosen, you can change the number of iterations, and relative threshold to speed it up (I do not recommend it as we spend a considerable amount of time to fine tune them). Third, what version of WW3 are you using? if it is the most recent one, it should be fast enough. a storm condition of 2 weeks time on a 5M node can be done in 8 hrs or so on ~1000-2000 cpus. If you are using the old version of WW3 (the one which is now 2-3 yrs old), I'd recommend to not worry about the speed, as it is temp and you will gain speed once you switch to the most recent version of WW3.
@sbanihash @AliS-Noaa @saeed-moghimi-noaa @pvelissariou1
Hi Ali @aliabdolali
Thank you very much! I will use the CFL condition to choose the time step.
And could I know where I could change the number of iterations, and relative threshold in the namelist? I am not very familiar with it.
The WW3 I am using is the version used in Ufs-coastal which is 02693d837f2cd99d20ed08515878c2b5e9525e64 (modified 3 months ago), is the speed of this version slower than the most recent one?
Thank you very much!
Best,
Yunfang
The definitions are all listed here: https://github.com/erdc/WW3/blob/develop/model/inp/ww3_grid.inp but as I said, I'd recommend to not change them.
A code from 3 months ago is good enough.
@yunfangsun I'd suggest creating a new issue for your initial timestepping question , assigning to yourself, and adding to SurgeTeamCoordinationProject
@janahaddad I have done it as you suggested
@uturuncoglu @pvelissariou1,
Hi Ufuk,
For the ATM+WW3 case, it could start but only produced the 20220915.000000.out_grd.ww3.nc and 20220915.010000.out_grd.ww3.nc, and then the job is still hanging but won't move on, and for the log.ww3 it stopped at
0| 1| 2022/09/15 00:00:00 | F | X |
36| 1| 01:00:00 | X | X |
--------+------+---------------------+-----------------------+------------------+
And the PET0960.ESMF_LogFile file stopped at the following
20240112 160941.197 INFO PET0960 (wav_comp_nuopc:wavinit_ufs) call w3init
20240112 161106.748 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Export Field = cpl_scalars is connected on root pe
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Export Field = Sw_z0 is not connected.
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Export Field = Sw_wavsuu is not connected.
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Export Field = Sw_wavsuv is not connected.
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Export Field = Sw_wavsvv is not connected.
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Export Field = Sw_pstokes_x is not connected.
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Export Field = Sw_pstokes_y is not connected.
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Import Field = Si_ifrac is not connected.
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Import Field = So_u is not connected.
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Import Field = So_v is not connected.
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Import Field = So_t is not connected.
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Import Field = Sa_tbot is not connected.
20240112 161106.761 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Import Field = Sa_u10m is connected using mesh
20240112 161106.762 INFO PET0960 (wav_import_export:fldlist_realize)(wav_import_export:realize_fields):WW3Import Field = Sa_v10m is connected using mesh
20240112 161110.554 DEBUG PET0960 about to destroy Mesh: 0x6118290
20240112 161121.543 INFO PET0960 (wav_comp_nuopc):(ModelSetRunClock) called
20240112 161121.543 INFO PET0960 (wav_comp_nuopc):(ModelSetRunClock)setting alarms for WAV
the datm.log stopped at
(shr_strdata_readstrm) opening : era5/download_inv_fix.nc
(shr_strdata_readstrm) setting pio descriptor : era5/download_inv_fix.nc
(shr_strdata_set_stream_iodesc) setting iodesc for : u10 with dimlens(1), dimlens(2) = 1440 721 variable as time dimension time
(shr_strdata_readstrm) reading file lb: era5/download_inv_fix.nc 337
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 338
atm : model date 20220915 0
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 339
atm : model date 20220915 3600
The mediator.log stopped at
(med_time_alarmInit): creating alarm alarm_history_inst_all
(med_phases_history_write) initialized history alarm alarm_history_inst_all with option nhours and frequency 1
(med_phases_history_write) : history alarmname alarm_history_inst_all is ringing, interval length is 3600
(med_phases_history_write) : mclock currtime = 2022-09-15-00000 mclock nexttime = 2022-09-15-03600
(med_phases_history_set_timeinfo) writing mediator history file ufs.cpld.cpl.hi.2022-09-15-03600.nc
(med_phases_history_set_timeinfo) currtime = 2022-09-15-00000 nexttime = 2022-09-15-03600
(med_io_wopen) creating file ufs.cpld.cpl.hi.2022-09-15-03600.nc
I have tried a few times, the job never drops, but it also will not continue, I have to kill it and my folder is located at /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/rt_15740_atm_ww/coastal_ian_atm2ww3_intel_1 on Hercules.
Do you have any suggestions?
Thank you!
@yunfangsun I have just run coastal_ike_shinnecock_atm2ww3
case and it is running without any issue. Since you are using very high resolution application, it might take time to calculate required routehandles in ESMF side. Home many PETs are you assigning in each component (I have no permission to access your folder). So keep the run in the queue and see what happens. If that no work, we could try to attach gdb to the processes and try to collect backtrace to see where is the issue.
@yunfangsun Please post _petlist_bounds variables and their values. You could also try to remove MED med_phases_history_write and restart from run sequence see if that helps. We might need to play with pio (parallel io library used by mediator) settings to make it more efficient.
@uturuncoglu I have changed the permission of my folder, you should be able to get access to it at /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/rt_15740_atm_ww/coastal_ian_atm2ww3_intel_1
For the cores, I am using
# EARTH #
EARTH_component_list: ATM WAV MED
EARTH_attributes::
Verbosity = 0
::
# MED #
MED_model: cmeps
MED_petlist_bounds: 0 100
MED_omp_num_threads: 1
MED_attributes::
ATM_model = datm
WAV_model = ww3
history_n = 1
history_option = nhours
history_ymd = -999
coupling_mode = coastal
::
# ATM #
ATM_model: datm
ATM_petlist_bounds: 0 100
ATM_omp_num_threads: 1
ATM_attributes::
Verbosity = 0
DumpFields = false
ProfileMemory = false
OverwriteSlice = true
::
# WAV #
WAV_model: ww3
WAV_petlist_bounds: 101 4999
WAV_omp_num_threads: 1
WAV_attributes::
Verbosity = 0
DumpFields = false
ProfileMemory = false
merge_import = .false.
mesh_wav = atlantic_ESMFmesh.nc
multigrid = false
gridded_netcdfout = true
diro = "."
logfile = wav.log
::
# Run Sequence #
runSeq::
@3600
MED med_phases_prep_atm
MED med_phases_prep_wav_accum
MED med_phases_prep_wav_avg
MED -> ATM :remapMethod=redist
MED -> WAV :remapMethod=redist
ATM
WAV
ATM -> MED :remapMethod=redist
WAV -> MED :remapMethod=redist
MED med_phases_post_atm
MED med_phases_post_wav
MED med_phases_restart_write
MED med_phases_history_write
@
::
ALLCOMP_attributes::
ScalarFieldCount = 3
ScalarFieldIdxGridNX = 1
ScalarFieldIdxGridNY = 2
ScalarFieldIdxNextSwCday = 3
ScalarFieldName = cpl_scalars
start_type = startup
restart_dir = RESTART/
case_name = ufs.cpld
restart_n = 12
restart_option = nhours
restart_ymd = -999
orb_eccen = 1.e36
orb_iyear = 2000
orb_iyear_align = 2000
orb_mode = fixed_year
orb_mvelp = 1.e36
orb_obliq = 1.e36
stop_n = 36
stop_option = nhours
stop_ymd = -999
::
@yunfangsun please remove history and restart write from the run sequence and then increase the number of cores for mediator. So, set like 0 4999 so mediator will run on all the processors. If this helps to run the case, then try to add mediator history and restart to the run sequence again to see what happens. I think you don't need to put those to your run sequence but at least it would be nice to have the restart ones. We might also look at pio settings if those help and it might help to improve the io performance.
@uturuncoglu I have changed it to # MED #
MED_model: cmeps
MED_petlist_bounds: 0 4999
MED_omp_num_threads: 1
MED_attributes::
ATM_model = datm
WAV_model = ww3
history_n = 1
history_option = nhours
history_ymd = -999
coupling_mode = coastal
::
# ATM #
ATM_model: datm
ATM_petlist_bounds: 0 100
ATM_omp_num_threads: 1
ATM_attributes::
Verbosity = 0
DumpFields = false
ProfileMemory = false
OverwriteSlice = true
::
# WAV #
WAV_model: ww3
WAV_petlist_bounds: 101 4999
WAV_omp_num_threads: 1
WAV_attributes::
Verbosity = 0
DumpFields = false
ProfileMemory = false
merge_import = .false.
mesh_wav = atlantic_ESMFmesh.nc
multigrid = false
gridded_netcdfout = true
diro = "."
logfile = wav.log
::
# Run Sequence #
runSeq::
@3600
MED med_phases_prep_atm
MED med_phases_prep_wav_accum
MED med_phases_prep_wav_avg
MED -> ATM :remapMethod=redist
MED -> WAV :remapMethod=redist
ATM
WAV
ATM -> MED :remapMethod=redist
WAV -> MED :remapMethod=redist
MED med_phases_post_atm
MED med_phases_post_wav
MED med_phases_restart_write
@
::
ALLCOMP_attributes::
ScalarFieldCount = 3
ScalarFieldIdxGridNX = 1
ScalarFieldIdxGridNY = 2
ScalarFieldIdxNextSwCday = 3
ScalarFieldName = cpl_scalars
start_type = startup
restart_dir = RESTART/
case_name = ufs.cpld
restart_n = 12
restart_option = nhours
restart_ymd = -999
orb_eccen = 1.e36
orb_iyear = 2000
orb_iyear_align = 2000
orb_mode = fixed_year
orb_mvelp = 1.e36
orb_obliq = 1.e36
stop_n = 36
stop_option = nhours
stop_ymd = -999
::
Is the modification correct to your suggestion?
@yunfangsun Yes. That is correct. Please also remove history and restart phases from run sequence.
Hi @uturuncoglu
::
# Run Sequence #
runSeq::
@3600
MED -> ATM :remapMethod=redist
MED -> WAV :remapMethod=redist
ATM
WAV
ATM -> MED :remapMethod=redist
WAV -> MED :remapMethod=redist
@
::
Is this one correct?
@yunfangsun You need to use following
# Run Sequence #
runSeq::
@3600
MED med_phases_prep_atm
MED med_phases_prep_wav_accum
MED med_phases_prep_wav_avg
MED -> ATM :remapMethod=redist
MED -> WAV :remapMethod=redist
ATM
WAV
ATM -> MED :remapMethod=redist
WAV -> MED :remapMethod=redist
MED med_phases_post_atm
MED med_phases_post_wav
MED med_phases_restart_write
MED med_phases_history_write
@
::
and you could remove
MED med_phases_restart_write
MED med_phases_history_write
from it and test it. If it works. Try to add MED med_phases_restart_write
.
@uturuncoglu
I should firstly try
# Run Sequence #
runSeq::
@3600
MED med_phases_prep_atm
MED med_phases_prep_wav_accum
MED med_phases_prep_wav_avg
MED -> ATM :remapMethod=redist
MED -> WAV :remapMethod=redist
ATM
WAV
ATM -> MED :remapMethod=redist
WAV -> MED :remapMethod=redist
MED med_phases_post_atm
MED med_phases_post_wav
@
::
Is my understanding correct?
@yunfangsun Yes.
@uturuncoglu Thank you! I have just submitted it
@uturuncoglu
Now it could run 40 hours, it stopped at 09-16-16:00, and the mediator.log shows:
Add wevap to budgets with index 20
Add wrunoff to budgets with index 21
Add wfrzrof to budgets with index 22
Add saltf to budgets with index 23
Add inst to budgets with index 1
Add all_time to budgets with index 2
(med.F90:DataInitialize) read_restart = F
(med_time_alarmInit): creating alarm med_profile_alarm
(med_time_alarmInit): creating alarm alarm_stop
and log.ww3 shows:
1332| 37| 13:00:00 | X | X |
--------+------+---------------------+-----------------------+------------------+
1368| 38| 14:00:00 | X | X |
--------+------+---------------------+-----------------------+------------------+
1404| 39| 15:00:00 | X | X |
--------+------+---------------------+-----------------------+------------------+
1440| 40| 16:00:00 | X | X |
--------+------+---------------------+-----------------------+------------------+
ymd2date currTime wav_comp_nuopc hh,mm,ss,ymd 16 0 0 20220916
do you have any suggestions?
@yunfangsun is there anything in other log files such as err, out and datm.log? If you don't mind could you submit job again and see if is failing in the same place or not. If it dies not help, then please send me all the information to me to reproduce the run in my end.
@uturuncoglu the datm.log seems there is no problem:
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 370
atm : model date 20220916 28800
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 371
atm : model date 20220916 32400
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 372
atm : model date 20220916 36000
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 373
atm : model date 20220916 39600
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 374
(dshr_restart_write) writing ufs.cpld.datm.r.2022-09-16-43200.nc20220916 43200
atm : model date 20220916 43200
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 375
atm : model date 20220916 46800
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 376
atm : model date 20220916 50400
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 377
atm : model date 20220916 54000
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 378
atm : model date 20220916 57600
(shr_strdata_readstrm) reading file ub: era5/download_inv_fix.nc 379
atm : model date 20220916 61200
The out file is also normal
101: No. of solver iterations 20 2379724 3.65856522059484
101: 3.00000000000000
101: No. of solver iterations 30 2435127 1.41561414261967
101: 3.00000000000000
101: No. of solver iterations 0 1720715 30.3380762027680
101: 3.00000000000000
101: No. of solver iterations 10 2163503 12.4121187290848
101: 3.00000000000000
101: No. of solver iterations 20 2379713 3.65901054777672
101: 3.00000000000000
101: No. of solver iterations 30 2435206 1.41241588376799
101: 3.00000000000000
101: No. of solver iterations 0 1720715 30.3380762027680
101: 3.00000000000000
101: No. of solver iterations 10 2164108 12.3876257340814
101: 3.00000000000000
101: No. of solver iterations 20 2379921 3.65058981561026
101: 3.00000000000000
101: No. of solver iterations 30 2435584 1.39711282242700
101: 3.00000000000000
101: No. of solver iterations 0 1720715 30.3380762027680
101: 3.00000000000000
101: No. of solver iterations 10 2164606 12.3674645580290
101: 3.00000000000000
101: No. of solver iterations 20 2380067 3.64467910937802
101: 3.00000000000000
101: No. of solver iterations 30 2435578 1.39735572816257
101: 3.00000000000000
101: No. of solver iterations 0 1720715 30.3380762027680
101: 3.00000000000000
101: No. of solver iterations 10 2164854 12.3574244542920
101: 3.00000000000000
101: No. of solver iterations 20 2379836 3.65403098019751
101: 3.00000000000000
The error file is
+ ESMF_RUNTIME_PROFILE=ON
+ export ESMF_RUNTIME_PROFILE_OUTPUT=SUMMARY
+ ESMF_RUNTIME_PROFILE_OUTPUT=SUMMARY
+ [[ intel == gnu ]]
+ sync
+ sleep 1
+ srun --label -n 5000 ./fv3.exe
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
4196: forrtl: error (78): process killed (SIGTERM)
4196: Image PC Routine Line Source
4196: libc.so.6 000014E95E53CD90 Unknown Unknown Unknown
4196: fv3.exe 0000000001D474CA pdlib_w3profsmd_m 5922 w3profsmd_pdlib.F90
4196: fv3.exe 0000000001D3D635 pdlib_w3profsmd_m 2796 w3profsmd_pdlib.F90
4196: fv3.exe 0000000001BE4E68 w3wavemd_mp_w3wav 1843 w3wavemd.F90
4196: fv3.exe 00000000019E9D62 wav_comp_nuopc_mp 1126 wav_comp_nuopc.F90
4196: fv3.exe 0000000000C37EB8 Unknown Unknown Unknown
4196: fv3.exe 0000000000C37E27 Unknown Unknown Unknown
4196: fv3.exe 0000000000C36A03 Unknown Unknown Unknown
4196: fv3.exe 0000000000433182 Unknown Unknown Unknown
4196: fv3.exe 000000000205FCDD Unknown Unknown Unknown
4196: fv3.exe 0000000000B36B84 Unknown Unknown Unknown
The whole run folder is located on Hercules/Orion at /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/rt_15740_atm_ww/coastal_ian_atm2ww3_intel_1, and the compilation I am using is the coastal_ike_shinnecock_atm2ww3_intel regression test.
I have just resubmitted it, and will let you know when it fails
Hi @uturuncoglu I have tried twice by submitting job_card, and it stopped at different times, 36 hours and 37 hours, then I tried to use this same setting, but wind is using wind.ww3 which is interpolated from the exact era5 by using ww3_prnc, and use stand-alone ww3 by submitting (xmodel_slurm.job), and it didn't break, and kept running. all the files are located at /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/rt_15740_atm_ww/coastal_ian_atm2ww3_intel_1
@yunfangsun It seems that the issue in WW3 side. Maybe it is not getting fields from DATM. Let me check.
@yunfangsun It seems your are using ınp. format for configuration but in the RT I am using nml and when you couple the WW3 you need to change D to C for the wind but I am not seeing those kind of definition in your configuration file. So, could you use ww3_shel.nml
file from RT and modify its simulation date and run again to see what happens.
@uturuncoglu I have removed the ww3_shel.inp, and replaced it with ww3_shel.nml, the job stopped again after 40 hours.
@yunfangsun I could run your case on Hercules with Intel. At this point the run is in 55 hours and still running. I just compile the case with latest version of ufs-coastal using following command,
./compile.sh "hercules" "-DAPP=CSTLW -DPDLIB=ON" coastal intel NO NO
except using Intel (I am not sure but maybe you are using GNU), I did not change anything in the configuration. Anyway, I'll let you know if it fails but you might try with Intel indoor and let me know how it goes.
@yunfangsun The performance of the model is something like 55 hours in 44 min but this includes also initialization. So, roughly you can do 75 hours/hours. It seems that you are tying to run 900 hours. So, I think this will not finish in 8 hours time window. I suggest to increase the number of cores more. Maybe you could try with doubling WW3 resource. Anyway, of course this runs without any issue in your end.
@uturuncoglu I have download the newest version of ufs-coastal at /work2/noaa/nos-surge/yunfangs/ufs-coastal
and compile it by using ./compile.sh "hercules" "-DAPP=CSTLW -DPDLIB=ON" coastal intel NO NO
, and the run folder is in /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/rt_15740_atm_ww/coastal_ian_atm2ww3_intel_3, it stopped after 40 hours again,
and compile it by using regression test ./rt.sh-a coast -l rt_coastal.conf -c -k -n coastal_ike_shinnecock_atm2ww3 intel
, run it in the folder /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/rt_1795978_atm_ww3_new/coastal_ian_atm2ww3_intel, it stopped after 39 hours.
@yunfangsun I am not sure what is wrong in your case but mine is run until 20221012.000000 (started from 20220915.000000) without any issue under my account. It is almost 649 hours. So, we still need to increase the resource. BTW, this is my run directory if you want to compare with yours (/work/noaa/nems/tufuk/COASTAL/coastal_ian_atm2ww3_intel_1
). I am still not sure ww3_shel.inp is correct or not. I'll try to run the same case by removing wind file from run directory and use nml to be sure it is getting wind from cdeps.
@uturuncoglu Could you please change the permission of the folder /work/noaa/nems/tufuk/COASTAL/coastal_ian_atm2ww3_intel_1, I can't get access to it.
@yunfangsun i did it.
@uturuncoglu, @yunfangsun, @saeed-moghimi-noaa Thank you so much for your response on such a short notice and over the weekend. Your help is greatly appreciated. @yunfang Thank you for spending time to resolve all the issues. Yunfang, could you please document all the steps you followed to setup the Ian application inside your RT folder and UFS-Coastal (configuration, compilation and run). I am doing the same thing with my simulations but configuring them in such a way so they can run outside the RT folder and without running them as a test case within UFS-Coastal. At the end I want to combine all the steps and requirements in one document. I'll report back on the progress and issues from all the simulations with and without waves.
@pvelissariou1 Agree. Documenting each step will help us to implement the workflow.
@uturuncoglu my atm2sch2ww3 configuration is at /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/rt_11601_atm2sch2ww3/coastal_ian_atm2sch2ww3_intel_1 Thank you for your great help!
Okay. I’ll try to run that one tonight. I resubmitted your first case and it is in /work/noaa/nems/tufuk/COASTAL/coastal_ian_atm2ww3_intel_1. It is still running compared almost one day.
—ufuk
On Jan 16, 2024, at 2:54 PM, Yunfang Sun @.***> wrote:
@uturuncoglu https://github.com/uturuncoglu my atm2sch2ww3 configuration is at /work2/noaa/nos-surge/yunfangs/stmp/yunfangs/FV3_RT/rt_11601_atm2sch2ww3/coastal_ian_atm2sch2ww3_intel_1 Thank you for your great help!
— Reply to this email directly, view it on GitHub https://github.com/oceanmodeling/ufs-coastal/issues/7#issuecomment-1894576744, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJMBR4R57RIP3SIMMCQJLLYO3ZKDAVCNFSM6AAAAAAZNTXHGGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJUGU3TMNZUGQ. You are receiving this because you were mentioned.
@uturuncoglu thank you
@yunfangsun The DATM+WW3 run is finished. It is in /work/noaa/nems/tufuk/COASTAL/coastal_ian_atm2ww3_intel_1
. Please check the results and let me know if you need any change. I'll run other job soon.
@uturuncoglu the results file permission is denied, can you please change the permissions of the files? Thank you.
@uturuncoglu Thank you
@yunfangsun Hercules is down now but maybe you could reach from Orion. I fixed the permission.
@uturuncoglu thank you
New configuration of stand alone WW3: for regression test: coastal_ike_shinnecock_ww3
WAV_attributes:: Verbosity = 0 DumpFields = false ProfileMemory = false merge_import = .false. mesh_wav = mesh.shinnecock.cdf5.nc multigrid = false gridded_netcdfout = true diro = "." logfile = wav.log standalone = true ::
@pvelissariou1 I am opening this issue for WW3 integration and testing. Since WW3 is integral part of the ufs-weather-model, I'll try to test existing configurations under
CoastalApp-testsuite
in here to see the issues that we might face.