schism-dev / schism

Semi-implicit Cross-scale Hydroscience Integrated System Model (SCHISM)
http://ccrm.vims.edu/schismweb/
Apache License 2.0
93 stars 88 forks source link

Initial timesteps results for a hotstart run #140

Closed SorooshMani-NOAA closed 4 months ago

SorooshMani-NOAA commented 5 months ago

Hi, I have a setup (attached compressed files) where I have a spinup run and then a hotstart run which starts immediately after the spinup. At the initial times of the hotstart run I get some unexpected results (see water level plots vs obs below). This is a tide only case:

image

I'm guessing something is wrong with my setup, but I'm not sure what. Can you please help? Thanks!

https://drive.google.com/file/d/1X5OjfQTRtEX6X6q2JuWx3D7b4NRYudIN/view?usp=drive_link

SorooshMani-NOAA commented 5 months ago

In the shared files (I couldn't attach since the compressed size was 84MB) between the spinup and hotstart run the hgrid.gr3, vgrid.in, and manning.gr3 are exactly the same. The bctides.in is different (as expected) and the param.nml has the following diff:

=== diff hotstart/param.nml spinup/param.nml
5c5
<   rnday=8.25
---
>   rnday=8.0
8c8
<   ihfskip=4752
---
>   ihfskip=4608
14c14
<   start_day=10
---
>   start_day=2
18c18,20
<   ihot=1
---
>   dramp=8.0
>   drampbc=8.0
>   dramp_ss=8.0
21a24
>   drampwind=8.0
26c29
<   nhot_write=4752
---
>   nhot_write=4608
SorooshMani-NOAA commented 5 months ago

I noticed this issue when I was originally running an ensemble for Florence and checking discrepancy when looking at different set of runs with different spinup and start dates (see plots below for original run with best track forcing): image

As you can see in both of the best-track run cases the hostarts have this intensification of amplitude for water level initially. Note that in this case it is the original track results (not related to track perturbation, etc.) and both the spinup and hotstart have best track forcing.

Tagging @WPringle and @FariborzDaneshvar-NOAA

SorooshMani-NOAA commented 5 months ago

I'm wondering if there's any specific considerations for hotstart, e.g. overlap between spinup end and hotstart start, etc.

SorooshMani-NOAA commented 5 months ago

Note that for the results above I used ihot=1, and copied the combined hotstart.nc (using combine_hotstart7) to the input directory (alongside param.nml file). After the issue above I also tried setting ihot=2 and copying over flux.out from the spinup run, but then after a couple of timesteps I get 0: ABORT: nc_writeout3D: put time error.

First of all, does it matter whether I use ihot=1 or ihot=2? In case I need to go with 2, what could be the reason behind the error I get?

Update I'd like to note that the for ihot=2 crash case mirror.out files shows that 24 steps are completed without any issues, but the issue actually happens at the first output step. I tried copying output .nc files in outputs from the spinup run to the hotstarted outputs dir as well, but it didn't help

josephzhang8 commented 5 months ago

See manual for the difference btw ihot=1,2:

https://schism-dev.github.io/schism/master/input-output/param.html#ihot0-int

SorooshMani-NOAA commented 5 months ago

Thank you; yes, I've already read that part a couple of times and based on what I understand I only need to copy flux.out to my new output dir for ihot=2 compared to =1. But that doesn't seem to be enough. It also says

On the other hand, you don't need to have the global outputs in outputs/

so I'm assuming I shouldn't copy my out2d_1.nc and zcoords and u and v .nc files. Although just to be sure I even copied these and still I get the crash.

In any case I don't really care about running ihot=2 if this can be resolved with the ihot=1! All I care about is fixing the weird amplitude intensification at the initial steps of the successful hotstart run (with ihot=1)

jreniel commented 5 months ago

I've seen this happen when the values of the hotstart file don't match the equilibrium of the model at the zeroth time step. Your hotstart output time might actually not match exactly your model's first time step. If you look at this plot:

image

There is a gap between the last hotstart value and the first model value. A good hotstart has as last value exactly the zeroth timestep for the next run, therefore I wouldn't expect any gaps in your plot. I am only speculating here, because I haven't looked at your inputs closely, but I can tell you from personal experience under what conditions I have seen this happen before. It normally happens because the model "jumps" to reach hydrostatic equilibrium because the initial values don't actually match the forcing inputs for that step. Double check that your hotstart input value are exactly at the zeroth time of the next run.

SorooshMani-NOAA commented 5 months ago

I see, I also suspected that there might be something along those lines, but then I thought the gap is just because of how the output is written: The last output of the "spinup" is the initial step of the "hostarted" run, but the output of the seconds run doesn't include the initial condition. That's why I asked if there needs to be more than one timestep overlap between spinup and hotstarted run. This is how the relevant parameters in param.nml files are:

Spinup:

rnday=8.0
dt=150.0
nspool=24
start_year=2018
start_month=9
start_day=2
start_hour=18.0

Hotstarted:

rnday=8.25
dt=150.0
nspool=24
start_year=2018
start_month=9
start_day=10
start_hour=18.0
jreniel commented 5 months ago

Thanks for sharing @SorooshMani-NOAA, while I won't be reading/analyzing in detail your setup, all I can say is that the gap is not supposed to be there and it's likely the reason why you are seeing the jump. Good luck!

Also, wrt to overlaps, no there are no overlaps. hostart_data === timestep_0 in model run

SorooshMani-NOAA commented 5 months ago

Thank you, I'll look into the gap thing then!

josephzhang8 commented 5 months ago

Also look into ramp-up parameters. It's awkward to use ihot=1 to restart a run that is still in ramp up phase. ihot=2 would be easier.

In your ihot=1 case, you are re-ramping with ramp* from the new time origin.

SorooshMani-NOAA commented 5 months ago

@josephzhang8 do you suggest that I have short ramp for the hotstart or is it better to have the spinup total time be longer than the ramp time? e.g. can I have 8 day spinup with 6 day ramp and then a hotstart with no ramp? or is it better to have a e.g. 8 day spinup with 8 day ramp + 1 day ramp in hotstart?

I'm still running to the crash issue for hot=2, so if I just resolve the ramp and there's no other reason to use hot=2, I'll just stick with =1 for now, thanks!

Also about the gap, I still think what I said earlier makes sense. The gap @jreniel pointed out is reasonable to be there. I'm plotting two separate out2d_1.nc elevations, one from the spinup run and the other from the hotstarted run. The last output of my spinup run is the initial condition of the hotstarted run, but the first output step of the initial hotstarted run is at step 24 (no IC in the output); so I still have hostart_data === timestep_0, however the hotstarted output doesn't have timestep_0 it has timestep_1.

SorooshMani-NOAA commented 5 months ago

I changed the spinup ramp to 6 days (out of 8) and then added a 2 day ramp to ihot=1 hotstart run, but I still get similar results:

image

Do you have any suggestions about what could cause problem for nc write when using ihot=2? I want to try =2 as well to see if I get anything better than this.

SorooshMani-NOAA commented 4 months ago

Adding email comm:

from Joseph:

I used your spinup/ (but changed rnday to 20 days so I can hotstart from Day 8) and then set up ihot=2, with v5.11.1. The elev @ Springmaid as shown below looks fine.

For hotstart run, I simply combined hotstart outputs at step 4608 (t=8 days) in spinup/ and linked it to the hotstart run. Then in param.nml, I simple added ihot=2, and copied outputs/flux.out and run.

If u want to use ihot=1, you need to modify the nodal factor and arguments in bctides.in. Note that your start_hour=18 so be careful when generating the factor etc. I tried to generate the new bctides.in for u, but I cannot reconcile the factors for some minor constituents like Mm with your version. This might​ create some minor discontinuity in elev due to sudden jumps in factors/arguments. image

So my takeaways are:

  1. To run ihot=2, my spinup should continue beyond the point of hotstart initial time (although we write hotstart at the specific time)
  2. My bctides.in is the problem for the case where I use ihot=1. But I'm using pyschism to generate tide BC for both runs, why is it that I have problem for the second run?
josephzhang8 commented 4 months ago

I used my own tool to generate bctides.in, especially for those minor constituents (Mm etc), and redid the spinup and ihot=1@ t=8 day runs. Below is the comparison of elev @ Springmaid between 2 runs; the hotstarted results are nearly identical to spinup (after shifting time by 8 days).

image

I've uploaded my runs at: https://ccrm.vims.edu/yinglong/TMP/hotstart_issue_July2024.tgz

SorooshMani-NOAA commented 4 months ago

Thanks @josephzhang8, I'll try to replicate your results.

By the way something I forgot to mention before is that I'm not actually using station output, but the output of the closest node neighboring the station from out2d_1.nc file.

josephzhang8 commented 4 months ago

Using post-proc to extract time series should not be a problem.

SorooshMani-NOAA commented 4 months ago

I was successful in running ihot=2 case with my tides, my result is a bit different from the obs (out of phase and smaller amp), but it works and there's no jump:

image

Something I didn't know and now I understand is that ihot=2 needs that the hotstart start date/time be the same as spinup's (right?) and that's what documentation means about the continuation of time.

For ihot=1, it's a bit different as time/step restarts, so it's as if the hotstart run just gets the hotstart values as initial values; here the start time/date of hotstart run should just match the date/time of the hotstart output file from spinup. I still have to figure out why I have that jump in ihot=1

SorooshMani-NOAA commented 4 months ago

@josephzhang8 after further testing I believe I found what specifically caused the issue for me. I guess this should be some sort of SCHISM bug. I first tried to use the setup you shared with me, but with my own bctides.nc and I established the issue is not the minor tides: image

After some more testing I started again from the same setup I shared with you at the top and added the following to param.nml in the hotstart:

>   dramp=0.0
>   drampbc=0.0
>   dramp_ss=0.0
>   drampwind=0.0
and this resolved the issue: Without *ramp*=0 specified With *ramp*=0 specified
image image

So for some reason if we do not specify the ramp variables at all the hotstart gives us these weird oscillations.

josephzhang8 commented 4 months ago

Thx @SorooshMani-NOAA. All optional parameters have default values in case user does not specify them. For dramp, it is 1 (day) (see sample_inputs/) so if you do not include it in you param.nml, this value would be used. In your case what you want is dramp=0 to go with ihot=0.

This is not a bug; default values are 'best guesses' and for users' convenience, but as you learn deeper into the model setup, these are not panacea. As much as I understand the appeal of using a supper short param.nml, it can only get you so far. That's why I always start from the sample param.nml with all parameters shown, even though only a few are important for a specific application.

pmav99 commented 4 months ago

@josephzhang8 we also struggled a bit with this. But, at least in our case, part of the confusion stemmed from the nrampwind parameter which seems to be ignored/inactive (maybe it should be removed?)

In any case, it might be useful to add a note in the docs that ramping should probably be disabled when using ihot=1. on ihot=2 it doesn't really matter because the model time is monotonically increasing, therefore the ramp up settings don't come into effect.

SorooshMani-NOAA commented 4 months ago

I see, thanks for the information ... I'll go ahead and close the ticket since this is not a bug and the original issue is resolved

josephzhang8 commented 4 months ago

@pmav99: I'll add the note on ihot=1 in the manual. nrampwind was removed a while ago so you'd get a fatal message.

pmav99 commented 4 months ago

nrampwind was removed a while ago so you'd get a fatal message.

@brey was testing this that with an older schism version. 5.10 maybe?. Anyway, It is still in the sample param.nml, as a comment, but it is there: https://github.com/schism-dev/schism/blob/5c054a09fd410924b8debdade7ed8a1a00082b43/sample_inputs/param.nml#L561

josephzhang8 commented 4 months ago

Yes, I left these as comments only.

janko-om commented 1 month ago

I was successful in running ihot=2 case with my tides, my result is a bit different from the obs (out of phase and smaller amp), but it works and there's no jump:

image

Something I didn't know and now I understand is that ihot=2 needs that the hotstart start date/time be the same as spinup's (right?) and that's what documentation means about the continuation of time.

For ihot=1, it's a bit different as time/step restarts, so it's as if the hotstart run just gets the hotstart values as initial values; here the start time/date of hotstart run should just match the date/time of the hotstart output file from spinup. I still have to figure out why I have that jump in ihot=1

I have the same issue with ihot=2, what exactly solved the problem that caused the crash with ABORT: nc_writeout3D: put time?

SorooshMani-NOAA commented 1 month ago

@janko-om as I remember the issue is start date/time of hotstart vs spinup. In ihot=1 your hotstart start date is the same as your spinup "end"/hotstart write time. But with ihot=2 the start dates should be the same