schism-dev / schism

Semi-implicit Cross-scale Hydroscience Integrated System Model (SCHISM)
http://ccrm.vims.edu/schismweb/
Apache License 2.0
93 stars 88 forks source link

WWM: netCDF defaults cause hotstart problems for large files #86

Open hot007 opened 2 years ago

hot007 commented 2 years ago

Hi there,

CSIRO folk have been running SCHISM v5.9, but have run into problems when enabling hotstarts in the coupled model. The model was falling over at the nf_close step of writing the hotstart file. Upon investigation, @pryancsiro found that modifying line #756 in wwm_hotfile.f90 to

iret = nf90_create(FILERET, IOR(NF90_CLOBBER, NF90_NETCDF4), ncid)

in other words, forcing it to use the netCDF4 instead of netCDF library defaults (classic), enabled the hotstart file to be written. Our theory is that the file was too big or dimensions in some way incompatible with the netCDF classic model, such that the hotstart file couldn't be written.

So our question is, is there a build flag that allows us to force use of netCDF4 when writing netCDFs? If not, would it be possible to enable a build or runtime option to select which netCDF format to use, please? (e.g. in WW3 this is a namelist option, though I think netCDF4 may be default now).

thanks

josephzhang8 commented 2 years ago

@hot007 thank you for the tips. I'm not familiar with IOR(); is it parallel netcdf I/O?

In SCHISM like schism_step we allow either nc3 (which has 3GB limit) or netcdf4 classic (more relaxed in file size). We have not tested nc4.

j=nf90_create(trim(adjustl(it_char)),OR(NF90_NETCDF4,NF90_CLOBBER),ncid_hot)

So your proposed change seems to be safe.

I'm curious b/c we have not got hotstart option in WWM to fully work. Could you share your experience and tips? It seems upon hotstart (read from a previous WWM run that outputs hot outputs), the wave fields still start from 0. Thanks.

pryancsiro commented 2 years ago

IOR is just the bitwise OR. See:

https://github.com/Unidata/netcdf-fortran/blob/main/examples/F90/simple_xy_par_rd.F90

Paul

From: Joseph Zhang @.> Sent: Tuesday, 20 September 2022 11:25 PM To: schism-dev/schism @.> Cc: Ryan, Paul (IM&T, Clayton) @.>; Mention @.> Subject: Re: [schism-dev/schism] WWM: netCDF defaults cause hotstart problems for large files (Issue #86)

@hot007https://github.com/hot007 thank you for the tips. I'm not familiar with IOR(); is it parallel netcdf I/O?

In SCHISM like schism_step we allow either nc3 (which has 3GB limit) or netcdf4 classic (more relaxed in file size). We have not tested nc4.

j=nf90_create(trim(adjustl(it_char)),OR(NF90_NETCDF4,NF90_CLOBBER),ncid_hot)

So your proposed change seems to be safe.

I'm curious b/c we have not got hotstart option in WWM to fully work. Could you share your experience and tips? It seems upon hotstart (read from a previous WWM run that outputs hot outputs), the wave fields still start from 0. Thanks.

— Reply to this email directly, view it on GitHubhttps://github.com/schism-dev/schism/issues/86#issuecomment-1252350527, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGWHDXMOVLWZIHVNR5XXXQDV7G3JHANCNFSM6AAAAAAQQ6RF3M. You are receiving this because you were mentioned.Message ID: @.***>

hot007 commented 2 years ago

Hi @josephzhang8 , As @pryancsiro said, that line isn't parallelism, it's just an OR - which in this context has the effect of AND, that is, clobber and use the netCDF4 library, not the classic model.

It may be too early to say the hotstarting "works" now, we'll get back to you on that, but forcing it to use netCDF4 at least stopped the code falling over! Anyway, thanks for flagging that there actually is a known issue with hotstarting, we'll check that the wave field in the hotstart isn't 0's when invoked.

VHernaman commented 2 years ago

Hi @josephzhang8, in answer to your question about the WWM hotstart not being picked up (i.e., wave field still starting from 0 even when hotstart turned on), I didn't find that for my runs. Below is a plot of significant wave height (Hs) for the months of January and February using SCHISMv5.9. January (blue line) is run from a cold start, and then I ran February from a cold start (red line; Feb_nohot) and then ran February again using WWM hotstart turned on (black line; Feb_hotstart), and you can see the hotstart was picked up (i.e., the hotstarted February run, shown by the black line, doesn't start from 0).

The relevant lines I had were: &INIT LHOTR = T ! Use hotstart file (see &HOTFILE section) LINID = F ! False if LHOTR=T INITSTYLE = 2 ! 1 - Parametric Jonswap, 2 - Read from Global NETCDF files, work only if IBOUNDFORMAT=3/6

&HOTFILE LHOTF = T ! Write hotfile FILEHOT_OUT = 'wwm_hot_out' !'.nc' suffix will be added, so don’t include suffix here BEGTC = '19860201.000000' !Starting time of hotfile writing. DELTC = 86400.0 ! time between hotfile writes UNITC = 'SEC' ! unit used above ENDTC = '19860301.000000' ! Ending time of hotfile writing (adjust with BEGTC) LCYCLEHOT = T ! Applies only to netcdf; If T then hotfile contains 2 last records. ! If F then hotfile contains N record if N outputs have been done; ! For binary only one record. HOTSTYLE_OUT= 2 ! 1: binary hotfile of data as output ! 2: netcdf hotfile of data as output (default) MULTIPLEOUT = 0 ! 0: hotfile in a single file (binary or netcdf); 1: hotfiles in separate files, each associated with one process FILEHOT_IN = 'wwm_hot_in.nc' ! (Full) hot file name for input HOTSTYLE_IN = 2 ! 1: binary hotfile of data as input; ! 2: netcdf hotfile of data as input (default) IHOTPOS_IN = 1 ! Position in hotfile (only for netcdf) MULTIPLEIN = 0 ! 0: read hotfile from one single file; ! 1: read hotfile from multiple files (must use same # of CPU?)

image

josephzhang8 commented 2 years ago

@VHernaman : I'll test your approach and update the manual if it works. Thanks a bunch!

josephzhang8 commented 2 years ago

I now confirm that the hotstart in WWM works; thx, Vanessa et al.! I'm updating the notes.