Initial timesteps results for a hotstart run

SorooshMani-NOAA commented 2 weeks ago

Hi, I have a setup (attached compressed files) where I have a spinup run and then a hotstart run which starts immediately after the spinup. At the initial times of the hotstart run I get some unexpected results (see water level plots vs obs below). This is a tide only case:

I'm guessing something is wrong with my setup, but I'm not sure what. Can you please help? Thanks!

https://drive.google.com/file/d/1X5OjfQTRtEX6X6q2JuWx3D7b4NRYudIN/view?usp=drive_link

SorooshMani-NOAA commented 2 weeks ago

In the shared files (I couldn't attach since the compressed size was 84MB) between the spinup and hotstart run the hgrid.gr3, vgrid.in, and manning.gr3 are exactly the same. The bctides.in is different (as expected) and the param.nml has the following diff:

=== diff hotstart/param.nml spinup/param.nml
5c5
<   rnday=8.25
---
>   rnday=8.0
8c8
<   ihfskip=4752
---
>   ihfskip=4608
14c14
<   start_day=10
---
>   start_day=2
18c18,20
<   ihot=1
---
>   dramp=8.0
>   drampbc=8.0
>   dramp_ss=8.0
21a24
>   drampwind=8.0
26c29
<   nhot_write=4752
---
>   nhot_write=4608

SorooshMani-NOAA commented 2 weeks ago

I noticed this issue when I was originally running an ensemble for Florence and checking discrepancy when looking at different set of runs with different spinup and start dates (see plots below for original run with best track forcing):

As you can see in both of the best-track run cases the hostarts have this intensification of amplitude for water level initially. Note that in this case it is the original track results (not related to track perturbation, etc.) and both the spinup and hotstart have best track forcing.

Tagging @WPringle and @FariborzDaneshvar-NOAA

SorooshMani-NOAA commented 2 weeks ago

I'm wondering if there's any specific considerations for hotstart, e.g. overlap between spinup end and hotstart start, etc.

SorooshMani-NOAA commented 2 weeks ago

Note that for the results above I used ihot=1, and copied the combined hotstart.nc (using combine_hotstart7) to the input directory (alongside param.nml file). After the issue above I also tried setting ihot=2 and copying over flux.out from the spinup run, but then after a couple of timesteps I get 0: ABORT: nc_writeout3D: put time error.

First of all, does it matter whether I use ihot=1 or ihot=2? In case I need to go with 2, what could be the reason behind the error I get?

Update I'd like to note that the for ihot=2 crash case mirror.out files shows that 24 steps are completed without any issues, but the issue actually happens at the first output step. I tried copying output .nc files in outputs from the spinup run to the hotstarted outputs dir as well, but it didn't help

josephzhang8 commented 2 weeks ago

See manual for the difference btw ihot=1,2:

https://schism-dev.github.io/schism/master/input-output/param.html#ihot0-int

SorooshMani-NOAA commented 2 weeks ago

Thank you; yes, I've already read that part a couple of times and based on what I understand I only need to copy flux.out to my new output dir for ihot=2 compared to =1. But that doesn't seem to be enough. It also says

On the other hand, you don't need to have the global outputs in outputs/

so I'm assuming I shouldn't copy my out2d_1.nc and zcoords and u and v .nc files. Although just to be sure I even copied these and still I get the crash.

In any case I don't really care about running ihot=2 if this can be resolved with the ihot=1! All I care about is fixing the weird amplitude intensification at the initial steps of the successful hotstart run (with ihot=1)

jreniel commented 2 weeks ago

I've seen this happen when the values of the hotstart file don't match the equilibrium of the model at the zeroth time step. Your hotstart output time might actually not match exactly your model's first time step. If you look at this plot:

There is a gap between the last hotstart value and the first model value. A good hotstart has as last value exactly the zeroth timestep for the next run, therefore I wouldn't expect any gaps in your plot. I am only speculating here, because I haven't looked at your inputs closely, but I can tell you from personal experience under what conditions I have seen this happen before. It normally happens because the model "jumps" to reach hydrostatic equilibrium because the initial values don't actually match the forcing inputs for that step. Double check that your hotstart input value are exactly at the zeroth time of the next run.

SorooshMani-NOAA commented 2 weeks ago

I see, I also suspected that there might be something along those lines, but then I thought the gap is just because of how the output is written: The last output of the "spinup" is the initial step of the "hostarted" run, but the output of the seconds run doesn't include the initial condition. That's why I asked if there needs to be more than one timestep overlap between spinup and hotstarted run. This is how the relevant parameters in param.nml files are:

Spinup:

rnday=8.0
dt=150.0
nspool=24
start_year=2018
start_month=9
start_day=2
start_hour=18.0

Hotstarted:

rnday=8.25
dt=150.0
nspool=24
start_year=2018
start_month=9
start_day=10
start_hour=18.0

jreniel commented 1 week ago

Thanks for sharing @SorooshMani-NOAA, while I won't be reading/analyzing in detail your setup, all I can say is that the gap is not supposed to be there and it's likely the reason why you are seeing the jump. Good luck!

Also, wrt to overlaps, no there are no overlaps. hostart_data === timestep_0 in model run

SorooshMani-NOAA commented 1 week ago

Thank you, I'll look into the gap thing then!

josephzhang8 commented 1 week ago

Also look into ramp-up parameters. It's awkward to use ihot=1 to restart a run that is still in ramp up phase. ihot=2 would be easier.

In your ihot=1 case, you are re-ramping with ramp* from the new time origin.

SorooshMani-NOAA commented 1 week ago

@josephzhang8 do you suggest that I have short ramp for the hotstart or is it better to have the spinup total time be longer than the ramp time? e.g. can I have 8 day spinup with 6 day ramp and then a hotstart with no ramp? or is it better to have a e.g. 8 day spinup with 8 day ramp + 1 day ramp in hotstart?

I'm still running to the crash issue for hot=2, so if I just resolve the ramp and there's no other reason to use hot=2, I'll just stick with =1 for now, thanks!

Also about the gap, I still think what I said earlier makes sense. The gap @jreniel pointed out is reasonable to be there. I'm plotting two separate out2d_1.nc elevations, one from the spinup run and the other from the hotstarted run. The last output of my spinup run is the initial condition of the hotstarted run, but the first output step of the initial hotstarted run is at step 24 (no IC in the output); so I still have hostart_data === timestep_0, however the hotstarted output doesn't have timestep_0 it has timestep_1.

SorooshMani-NOAA commented 1 week ago

I changed the spinup ramp to 6 days (out of 8) and then added a 2 day ramp to ihot=1 hotstart run, but I still get similar results:

Do you have any suggestions about what could cause problem for nc write when using ihot=2? I want to try =2 as well to see if I get anything better than this.

schism-dev / schism

Initial timesteps results for a hotstart run #140