noaa-ocs-modeling / PaHM

Parametric Hurricane Modeling System
Creative Commons Zero v1.0 Universal
5 stars 6 forks source link

Handle Missing Radius of Last Closed Isobar in SCHISM-PaHM #29

Closed FariborzDaneshvar-NOAA closed 7 months ago

FariborzDaneshvar-NOAA commented 1 year ago

Use /lustre/scripts/schism.sbatch to run SCHISM for the non-perturbed (original) faked BEST track that was generated by the ondemand-storm-workflow

Directory: /lustre/hurricanes/florence_2018_Fariborz_OFCL_10_v2/setup/ensemble.dir/runs/original/

FariborzDaneshvar-NOAA commented 1 year ago

@SorooshMani-NOAA It failed with this message after the model parameters in the slurm-*.out file:

---------- MODEL PARAMETERS ----------

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 36 PID 11838 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 37 PID 11839 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 38 PID 11840 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 7 (Bus error)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 39 PID 11841 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 40 PID 11842 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 41 PID 11843 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 42 PID 11844 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 43 PID 11845 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 44 PID 11846 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 45 PID 11847 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 46 PID 11848 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 47 PID 11850 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 48 PID 11851 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 49 PID 11853 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 50 PID 11854 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 51 PID 11856 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 52 PID 11858 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 53 PID 11859 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 54 PID 11860 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 55 PID 11861 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 56 PID 11862 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 57 PID 11863 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 58 PID 11864 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 59 PID 11865 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 60 PID 11866 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 61 PID 11867 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 62 PID 11868 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 63 PID 11869 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 64 PID 11870 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 65 PID 11871 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 66 PID 11872 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 67 PID 11873 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 68 PID 11874 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 69 PID 11875 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 70 PID 11876 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 71 PID 11877 RUNNING AT sorooshmani-nhccolab2-00004-1-0002
=   KILLED BY SIGNAL: 9 (Killed)
===================================================================================
SorooshMani-NOAA commented 1 year ago

Let me try to run it on my side and see what I can find out.

SorooshMani-NOAA commented 1 year ago

I tried running SCHISM using the same binaries as @FariborzDaneshvar-NOAA, and the same setup (symlinked all files) and it went through successfully. Maybe the issue had something to do with the platform cloud instance failure. In any case I have the output and still I get 0 winds:

>>> ds = xr.open_dataset('outputs/out2d_1.nc')
>>> ds.windSpeed
ds.windSpeedX  ds.windSpeedY
>>> ds.windSpeedX.max()
<xarray.DataArray 'windSpeedX' ()>
array(0., dtype=float32)
>>> ds.windSpeedY.max()
<xarray.DataArray 'windSpeedY' ()>
array(0., dtype=float32)

@pvelissariou1 we will provide the Sandy track (generated using the same method) so that you can test. In the meantime do you have any suggestion for us to debug? I used to compile and run SCHISM on PW without issues using this script and the binaries I ran in this case are generated using the same old script (no container, etc.)

Do you suggest that I debug it? If so, what flags should I use to debug and where to put breakpoints? We're compiling using the following modules on PW:

and using the commit ddf15649 from SCHISM:


* 784d20a0 Update overview.md
* 55ffa74b Update visualization.md
* 9e67694f Update visualization.md
* ddf15649 Fixed bugs in WWM/vegetation. 
* 3fdeb8fb fixed an issue with PaHM output; tested
* 299309b2 Add PaHM model type from param.nml
* 90a67ff3 Changed nws<0 to nws=-1 (more precise)
pvelissariou1 commented 1 year ago

@FariborzDaneshvar-NOAA and @SorooshMani-NOAA I agree, the error messages might be related to cloud instance issues as Soroosh replied. The modules: cmake intel/2021.3.0 impi/2021.3.0 hdf5/1.10.6 netcdf/4.7.0 should be fine. I suggest using the latest 2022 versions for intel and impi, or even the 2023 versions. The full log files will be more usefull in debuging.

pvelissariou1 commented 1 year ago

Let me get the same SCHISM commit and check it out. @SorooshMani-NOAA debuging SCHISM which runs without errors will be a trial and error approach, let me check the particular commit you are using and I'll let you know what I find.

pvelissariou1 commented 1 year ago

@FariborzDaneshvar-NOAA @SorooshMani-NOAA I guess you use nws=-1 in the namelist file and you define USE_PAHM=ON when compiling SCHISM.

SorooshMani-NOAA commented 1 year ago

Yes, we compiled with USE_PAHM=TRUE and in param.nml we have nws=-1

pvelissariou1 commented 1 year ago

I checked in detail SCHISM for PaHM related codes in (a) CoastalApp/SCHISM/schim (commit bb616ded), (b) schism-dev/schism (commit ddf15649) and (c) schism-dev/schism master (commit 9517c51e) and found no major differences in PaHM implementation. So, in principle the "fake" track files should work as they do with PaHM and with PaHM+ADCIRC in CoastalApp. I am setting up explicit tests in CoastalApp-testsuite using (a) SCHISM standalone (PaHM activated) and (b) PAHM+SCHISM coupled to check things out.

FariborzDaneshvar-NOAA commented 1 year ago

Thanks for the update!

pvelissariou1 commented 1 year ago

@FariborzDaneshvar-NOAA , @SorooshMani-NOAA , @saeed-moghimi-noaa

Background information:

I run schism (standalone) using commit ddf1564 from https://github.com/schism-dev/schism for both the BEST and OFCL tracks for hurricane Sandy and the shinnecock inlet case. During the schism compilation I used the following flags: -DOLDIO=ON -DUSE_WW3=ON -DUSE_PAHM=ON -DPREC_EVAP=OFF for the simulation, I set: nws=-1 in the param.nml file. I used the tracks for Sandy as supplied from Fariborz:

Sandy_hurricane-track_BEST.dat Sandy_hurricane-track_OFCL.dat Sandy_original_BEST.22 Sandy_original_OFCL.22

The simulation results for the shinnecock.sch case are in hera at:

/scratch2/STI/coastal/noscrub/shared/Takis/check_OFCLs/CoastalApp-testsuite/test_schism/ike_shinnecock.sch/run/outputs-CoastalApp-schism-ddf15649_Sandy_*model10

folders. The data are written in the schout_000001_1.nc files contained in each folder.

The BEST simulation results contain non-zero wind speeds that are non-zero as expected. The OFCL simulation results contain only zero wind speeds as we have already discussed.

IMPORTANT: Keep in mind, that if the storm path (eye in particular) is not near on inside the computational domain, PaHM produces zero winds (as expected). Also zero winds are produced if there are no data in the track file.

Issue resolution:

In SCHISM, the GAHM model has been modified slightly to use the radius of the last closed isobar (RRP) to reduce the amount of calculations in the domain (similar to the Holland model) by eliminating the nodal points outside RRP (I disagree with this approach but this is for future discussion with the SCHISM developers).

In our case, in the OFCL track files all the RRPs are set to zero, hence the problem with the OFCL files. My suggestion for a temporary workaround is to replace RRP by the max(radius1, radius2, radius3, radius4) of the 34 isotach. Also we might modify SCHISM to use either RRP (if found), or the max R34 found above or setting a default value.

SorooshMani-NOAA commented 1 year ago

Hi @pvelissariou1 thanks for testing. I can modify my script and later the stormevents code to update the RRP field for now.

pvelissariou1 commented 1 year ago

@SorooshMani-NOAA , @FariborzDaneshvar-NOAA At this point it will be better to comment out the code blocks in SCHISM for RRP and have a working version for your purposes. Let's talk about this in the meeting today.

Takis

Panagiotis Velissariou, Ph.D., P.E. UCAR Scientist National Ocean and Atmospheric Administration National Ocean Service Office of Coast Survey CSDL/CMMB Physical Scientist - Project Lead cell: (205) 227-9141 email: @.***

On Mon, Aug 14, 2023 at 7:17 AM Soroosh Mani @.***> wrote:

Hi @pvelissariou1 https://github.com/pvelissariou1 thanks for testing. I can modify my script and later the stormevents code to update the RRP field for now.

— Reply to this email directly, view it on GitHub https://github.com/noaa-ocs-modeling/PaHM/issues/29#issuecomment-1677212521, or unsubscribe https://github.com/notifications/unsubscribe-auth/APC7TP5Q2A3OQ45LBN4CO53XVIJMLANCNFSM6AAAAAA3LRJ3ZA . You are receiving this because you were mentioned.Message ID: @.***>

SorooshMani-NOAA commented 1 year ago

As we discussed, in short term we use a fork of SCHISM with updated PaHM code to ignore RRP, and then later we'll decide how to address this during normalization in stormevents as well: https://github.com/oceanmodeling/StormEvents/issues/84#issuecomment-1677204313

I'll also rename this ticket to reflect the main issue we're discussing here

FariborzDaneshvar-NOAA commented 1 year ago

@SorooshMani-NOAA thanks for updating the image. Looks like it worked and runs for OFCL tracks are also simulating the storm. Here are maximum horizontal wind speed plots of both OFCL and BEST tracks of florence 2018.

image

Here is also the maximum horizontal wind speed for the OFCL track of sandy 2012. image

Directory of new runs on the NHC_COLAB_2 cluster are:

With this fix, linked issue posted on the ondemand-storm-workflow repository will be resolved!

pvelissariou1 commented 1 year ago

@SorooshMani-NOAA , @FariborzDaneshvar-NOAA , @saeed-moghimi-noaa I came up with a solution that seems to work pretty well. This solution will be implemented in PaHM and in SCHISM/PaHM and most likely I'll push it to ADCIRC as well. @SorooshMani-NOAA , Soroosh you might want to implement this solution from your side as well. See the image below:

outer_radius1

SorooshMani-NOAA commented 1 year ago

@pvelissariou1, in a separate ticket related to NHC collaboration I brought up:

In PaHM Takis uses the RRP field (radius of the last closed isobar) to set all the wind field values to zero outside the contour. This logic breaks down for forecast where there's no such data available for some/all entries. Is it OK to instead use the 34 knot wind radius to set wind field to zero instead in these case?

as you've asked me to. @WPringle suggested:

@SorooshMani-NOAA It may not be necessary to set to zero anywhere. Just keep it as is as it is reducing exponentially.

I wanted to follow up with you to see what you think.

pvelissariou1 commented 1 year ago

@FariborzDaneshvar-NOAA @SorooshMani-NOAA Let's keep the setup we have right now in SCHISM/PaHM for RRP (GaHM model) where the particular piece of code is commented out and RRP is not used. Very soon I will implement the solution we have for RRP. The SCHISM developers added the RRP code in GaHM to reduce the computational load by excluding the nodal locations where the winds are actually zero or very close to zero. As @WPringle pointed out the fields are reduced eventually to zero at locations outside the RRP having though no physical meaning at these locations. The SCHISM developers will still like to have the RRP code for the reason described above.

FariborzDaneshvar-NOAA commented 8 months ago

@pvelissariou1 @SorooshMani-NOAA should we close this ticket?

pvelissariou1 commented 8 months ago

@FariborzDaneshvar-NOAA @SorooshMani-NOAA Please, let's close it end of next week

FariborzDaneshvar-NOAA commented 8 months ago

@pvelissariou1 can you please update this ticket and let me know if I can close it? thanks

pvelissariou1 commented 8 months ago

After the coupled simulations with PAHM are complete and evaluate the PAHM results I'll push the updates upstream to PaHM and to SCHISM. There is nothing else to add at this moment. Need to update this ticket when the changes to SCHISM/PAHM have been accepted, let's keep it open for 2-3 weeks.

pvelissariou1 commented 8 months ago

@FariborzDaneshvar-NOAA , @SorooshMani-NOAA I have updated PaHM to include the RRP resolution for both Holland and GAHM models. There are some rare occassions that all R34 and RRP radii are missing from the track file, and in these cases PaHM reverts to use all the nodal points when performing its interpolations. Later today (01/08/2024), I will push the PaHM updates to SCHISM, I'll let you know. Before I submit a PR, please consider testing the changes to SCHISM by cloning the "cmmb" branch.

pvelissariou1 commented 8 months ago

... that is: git clone https://github.com/schism-dev/schism.git -b cmmb. The cmmb branch has been merged with "master" so it should be the latest SCHISM commit.

janahaddad commented 7 months ago

@pvelissariou1 @FariborzDaneshvar-NOAA seems like we can close this?

pvelissariou1 commented 7 months ago

Yes, it is done from my side and Fariborz, Soroosh have tested the PaHM updates. Will reopen if needed.