Closed SorooshMani-NOAA closed 4 months ago
In the shared files (I couldn't attach since the compressed size was 84MB
) between the spinup and hotstart run the hgrid.gr3
, vgrid.in
, and manning.gr3
are exactly the same. The bctides.in
is different (as expected) and the param.nml
has the following diff:
=== diff hotstart/param.nml spinup/param.nml
5c5
< rnday=8.25
---
> rnday=8.0
8c8
< ihfskip=4752
---
> ihfskip=4608
14c14
< start_day=10
---
> start_day=2
18c18,20
< ihot=1
---
> dramp=8.0
> drampbc=8.0
> dramp_ss=8.0
21a24
> drampwind=8.0
26c29
< nhot_write=4752
---
> nhot_write=4608
I noticed this issue when I was originally running an ensemble for Florence and checking discrepancy when looking at different set of runs with different spinup and start dates (see plots below for original run with best track forcing):
As you can see in both of the best-track run cases the hostarts have this intensification of amplitude for water level initially. Note that in this case it is the original track results (not related to track perturbation, etc.) and both the spinup and hotstart have best track forcing.
Tagging @WPringle and @FariborzDaneshvar-NOAA
I'm wondering if there's any specific considerations for hotstart, e.g. overlap between spinup end and hotstart start, etc.
Note that for the results above I used ihot=1
, and copied the combined hotstart.nc
(using combine_hotstart7
) to the input directory (alongside param.nml
file). After the issue above I also tried setting ihot=2
and copying over flux.out
from the spinup run, but then after a couple of timesteps I get 0: ABORT: nc_writeout3D: put time
error.
First of all, does it matter whether I use ihot=1
or ihot=2
? In case I need to go with 2
, what could be the reason behind the error I get?
Update
I'd like to note that the for ihot=2
crash case mirror.out
files shows that 24 steps are completed without any issues, but the issue actually happens at the first output step. I tried copying output .nc
files in outputs
from the spinup run to the hotstarted outputs
dir as well, but it didn't help
See manual for the difference btw ihot=1,2:
https://schism-dev.github.io/schism/master/input-output/param.html#ihot0-int
Thank you; yes, I've already read that part a couple of times and based on what I understand I only need to copy flux.out
to my new output dir for ihot=2
compared to =1
. But that doesn't seem to be enough. It also says
On the other hand, you don't need to have the global outputs in
outputs/
so I'm assuming I shouldn't copy my out2d_1.nc
and zcoords and u and v .nc
files. Although just to be sure I even copied these and still I get the crash.
In any case I don't really care about running ihot=2
if this can be resolved with the ihot=1
! All I care about is fixing the weird amplitude intensification at the initial steps of the successful hotstart run (with ihot=1
)
I've seen this happen when the values of the hotstart file don't match the equilibrium of the model at the zeroth time step. Your hotstart output time might actually not match exactly your model's first time step. If you look at this plot:
There is a gap between the last hotstart value and the first model value. A good hotstart has as last value exactly the zeroth timestep for the next run, therefore I wouldn't expect any gaps in your plot. I am only speculating here, because I haven't looked at your inputs closely, but I can tell you from personal experience under what conditions I have seen this happen before. It normally happens because the model "jumps" to reach hydrostatic equilibrium because the initial values don't actually match the forcing inputs for that step. Double check that your hotstart input value are exactly at the zeroth time of the next run.
I see, I also suspected that there might be something along those lines, but then I thought the gap is just because of how the output is written: The last output of the "spinup" is the initial step of the "hostarted" run, but the output of the seconds run doesn't include the initial condition. That's why I asked if there needs to be more than one timestep overlap between spinup and hotstarted run. This is how the relevant parameters in param.nml files are:
Spinup:
rnday=8.0
dt=150.0
nspool=24
start_year=2018
start_month=9
start_day=2
start_hour=18.0
Hotstarted:
rnday=8.25
dt=150.0
nspool=24
start_year=2018
start_month=9
start_day=10
start_hour=18.0
Thanks for sharing @SorooshMani-NOAA, while I won't be reading/analyzing in detail your setup, all I can say is that the gap is not supposed to be there and it's likely the reason why you are seeing the jump. Good luck!
Also, wrt to overlaps, no there are no overlaps. hostart_data === timestep_0 in model run
Thank you, I'll look into the gap thing then!
Also look into ramp-up parameters. It's awkward to use ihot=1 to restart a run that is still in ramp up phase. ihot=2 would be easier.
In your ihot=1 case, you are re-ramping with ramp* from the new time origin.
@josephzhang8 do you suggest that I have short ramp for the hotstart or is it better to have the spinup total time be longer than the ramp time? e.g. can I have 8 day spinup with 6 day ramp and then a hotstart with no ramp? or is it better to have a e.g. 8 day spinup with 8 day ramp + 1 day ramp in hotstart?
I'm still running to the crash issue for hot=2
, so if I just resolve the ramp and there's no other reason to use hot=2
, I'll just stick with =1
for now, thanks!
Also about the gap, I still think what I said earlier makes sense. The gap @jreniel pointed out is reasonable to be there. I'm plotting two separate out2d_1.nc
elevations, one from the spinup run and the other from the hotstarted run. The last output of my spinup run is the initial condition of the hotstarted run, but the first output step of the initial hotstarted run is at step 24 (no IC in the output); so I still have hostart_data === timestep_0, however the hotstarted output doesn't have timestep_0 it has timestep_1.
I changed the spinup ramp to 6 days (out of 8) and then added a 2 day ramp to ihot=1
hotstart run, but I still get similar results:
Do you have any suggestions about what could cause problem for nc write when using ihot=2
? I want to try =2
as well to see if I get anything better than this.
Adding email comm:
from Joseph:
I used your spinup/ (but changed
rnday
to20
days so I can hotstart from Day 8) and then set upihot=2
, withv5.11.1
. The elev @ Springmaid as shown below looks fine.For hotstart run, I simply combined hotstart outputs at step
4608
(t=8 days) in spinup/ and linked it to the hotstart run. Then in param.nml, I simple addedihot=2
, and copiedoutputs/flux.out
and run.If u want to use
ihot=1
, you need to modify the nodal factor and arguments inbctides.in
. Note that yourstart_hour=18
so be careful when generating the factor etc. I tried to generate the newbctides.in
for u, but I cannot reconcile the factors for some minor constituents like Mm with your version. This might create some minor discontinuity in elev due to sudden jumps in factors/arguments.
So my takeaways are:
ihot=2
, my spinup should continue beyond the point of hotstart initial time (although we write hotstart at the specific time)bctides.in
is the problem for the case where I use ihot=1
. But I'm using pyschism
to generate tide BC for both runs, why is it that I have problem for the second run? I used my own tool to generate bctides.in, especially for those minor constituents (Mm etc), and redid the spinup and ihot=1@ t=8 day runs. Below is the comparison of elev @ Springmaid between 2 runs; the hotstarted results are nearly identical to spinup (after shifting time by 8 days).
I've uploaded my runs at: https://ccrm.vims.edu/yinglong/TMP/hotstart_issue_July2024.tgz
Thanks @josephzhang8, I'll try to replicate your results.
By the way something I forgot to mention before is that I'm not actually using station output, but the output of the closest node neighboring the station from out2d_1.nc
file.
Using post-proc to extract time series should not be a problem.
I was successful in running ihot=2
case with my tides, my result is a bit different from the obs (out of phase and smaller amp), but it works and there's no jump:
Something I didn't know and now I understand is that ihot=2
needs that the hotstart start date/time be the same as spinup's (right?) and that's what documentation means about the continuation of time.
For ihot=1
, it's a bit different as time/step restarts, so it's as if the hotstart run just gets the hotstart values as initial values; here the start time/date of hotstart run should just match the date/time of the hotstart output file from spinup. I still have to figure out why I have that jump in ihot=1
@josephzhang8 after further testing I believe I found what specifically caused the issue for me. I guess this should be some sort of SCHISM bug. I first tried to use the setup you shared with me, but with my own bctides.nc
and I established the issue is not the minor tides:
After some more testing I started again from the same setup I shared with you at the top and added the following to param.nml
in the hotstart:
> dramp=0.0
> drampbc=0.0
> dramp_ss=0.0
> drampwind=0.0
and this resolved the issue: | Without *ramp*=0 specified |
With *ramp*=0 specified |
---|---|---|
So for some reason if we do not specify the ramp variables at all the hotstart gives us these weird oscillations.
Thx @SorooshMani-NOAA. All optional parameters have default values in case user does not specify them. For dramp, it is 1 (day) (see sample_inputs/) so if you do not include it in you param.nml, this value would be used. In your case what you want is dramp=0 to go with ihot=0.
This is not a bug; default values are 'best guesses' and for users' convenience, but as you learn deeper into the model setup, these are not panacea. As much as I understand the appeal of using a supper short param.nml, it can only get you so far. That's why I always start from the sample param.nml with all parameters shown, even though only a few are important for a specific application.
@josephzhang8 we also struggled a bit with this. But, at least in our case, part of the confusion stemmed from the nrampwind
parameter which seems to be ignored/inactive (maybe it should be removed?)
In any case, it might be useful to add a note in the docs that ramping should probably be disabled when using ihot=1
. on ihot=2
it doesn't really matter because the model time is monotonically increasing, therefore the ramp up settings don't come into effect.
I see, thanks for the information ... I'll go ahead and close the ticket since this is not a bug and the original issue is resolved
@pmav99: I'll add the note on ihot=1 in the manual. nrampwind was removed a while ago so you'd get a fatal message.
nrampwind was removed a while ago so you'd get a fatal message.
@brey was testing this that with an older schism version. 5.10 maybe?. Anyway, It is still in the sample param.nml, as a comment, but it is there: https://github.com/schism-dev/schism/blob/5c054a09fd410924b8debdade7ed8a1a00082b43/sample_inputs/param.nml#L561
Yes, I left these as comments only.
I was successful in running
ihot=2
case with my tides, my result is a bit different from the obs (out of phase and smaller amp), but it works and there's no jump:Something I didn't know and now I understand is that
ihot=2
needs that the hotstart start date/time be the same as spinup's (right?) and that's what documentation means about the continuation of time.For
ihot=1
, it's a bit different as time/step restarts, so it's as if the hotstart run just gets the hotstart values as initial values; here the start time/date of hotstart run should just match the date/time of the hotstart output file from spinup. I still have to figure out why I have that jump inihot=1
I have the same issue with ihot=2, what exactly solved the problem that caused the crash with ABORT: nc_writeout3D: put time
?
@janko-om as I remember the issue is start date/time of hotstart vs spinup. In ihot=1
your hotstart start date is the same as your spinup "end"/hotstart write time. But with ihot=2
the start dates should be the same
Hi, I have a setup (attached compressed files) where I have a spinup run and then a hotstart run which starts immediately after the spinup. At the initial times of the hotstart run I get some unexpected results (see water level plots vs obs below). This is a tide only case:
I'm guessing something is wrong with my setup, but I'm not sure what. Can you please help? Thanks!
https://drive.google.com/file/d/1X5OjfQTRtEX6X6q2JuWx3D7b4NRYudIN/view?usp=drive_link