oceanmodeling / ondemand-storm-workflow

Other
2 stars 1 forks source link

Failed run once setting up files for the spinup run #24

Closed FariborzDaneshvar-NOAA closed 1 year ago

FariborzDaneshvar-NOAA commented 1 year ago

A test run for Dorian 2019 (with OFCL track) stock in the Setting up the model ... step and runs did not launch (DependencyNeverSatisfied). Here is the content of the slurm.out file for the failed step, noting that The GAHM asymmetric data structure has more than 4 iSotachs in cycle 59.

slurmstepd: error: TMPDIR [/lustre/.tmp] is not writeable
slurmstepd: error: Setting TMPDIR to /tmp
+ pushd /lustre/hurricanes/dorian_2019_b90c3ac1-d946-47d1-878c-322e8a63a34d/setup/ensemble.dir/spinup
/lustre/hurricanes/dorian_2019_b90c3ac1-d946-47d1-878c-322e8a63a34d/setup/ensemble.dir/spinup ~/ondemand-storm-workflow/singularity/scripts
+ mkdir -p outputs
+ mpirun -np 36 singularity exec --bind /lustre /lustre/singularity_images//solve.sif pschism_PAHM_TVD-VL 4

---------- MODEL PARAMETERS ----------
   title                = 
   bestTrackFileName(1) = hurricane-track.dat
   meshFileType         = 
   meshFileName         = 
   meshFileForm         = 

   gravity              = 9.81000 m/s^2
   rhoWater             = 1000.00000 kg/m^3
   rhoAir               = 1.14780 kg/m^3
   backgroundAtmPress   = 1013.25000 mbar
   windReduction        = 0.90

   refDateTime          = 
   refYear              = 2019
   refMonth             = 08
   refDay               = 22
   refHour              = 12
   refMin               = 00
   refSec               = 00
   refDateSpecified     = T

   begDateTime          = 
   begYear              = 2019
   begMonth             = 08
   begDay               = 22
   begHour              = 12
   begMin               = 00
   begSec               = 00
   begDateSpecified     = T

   endDateTime          = 5000-01-01 00:00:00
   endYear              = 5000
   endMonth             = 01
   endDay               = 01
   endHour              = 00
   endMin               = 00
   endSec               = 00
   endDateSpecified     = T

   unitTime             = S
   outDT                = -999999.00000 s
   mdOutDT              = -999999.00000 s
   begSimTime           = 0.00000 s
   mdBegSimTime         = 0.00000 s
   begSimSpecified      = T
   endSimTime           = 94051108800.00000 s
   mdEndSimTime         = 94051108800.00000 s
   endSimSpecified      = T
   nOutDT               = -999999

   outFileName          = 
   ncShuffle            = 0
   ncDeflate            = 0
   ncDLevel             = 0
   ncVarNam_Pres        = P
   ncVarNam_WndX        = uwnd
   ncVarNam_WndY        = vwnd

   modelType            =         10
---------- MODEL PARAMETERS ----------

InitLogging not called :: ProcessAsymmetricVortexData: 6 isotachs were nonzero.
InitLogging not called :: ProcessAsymmetricVortexData: 6 isotachs were nonzero.
InitLogging not called :: ProcessAsymmetricVortexData: 6 isotachs were nonzero.
InitLogging not called :: ProcessAsymmetricVortexData: 6 isotachs were nonzero.
InitLogging not called ::                                                   : The GAHM asymmetric data structure has more than 4 iSotachs in cycle 59.
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 0 on
node sorooshmani-nhccolab2-00005-1-0001 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.
--------------------------------------------------------------------------

Run directory on NHC_COLAB_2 cluster: /lustre/hurricanes/dorian_2019_b90c3ac1-d946-47d1-878c-322e8a63a34d

FariborzDaneshvar-NOAA commented 1 year ago

Here are the first few lines of hurricane track files:

SorooshMani-NOAA commented 1 year ago

The issue is that in the spinup case that we are not supposed to have any track, somehow it picks up the best track and uses it. And the issues of duplication is actually in the best track file directly downloaded from ATCF webpage. I need to find out why the best track is used and just get rid of it.

FariborzDaneshvar-NOAA commented 1 year ago

@SorooshMani-NOAA Thanks for looking into it.

SorooshMani-NOAA commented 1 year ago

@FariborzDaneshvar-NOAA this issue should be fixed for OFCL. I'm still working on remove duplication for the best track. If you rerun the workflow for OFCL (past forecast) it should work fine. I tested for a 7-member ensemble for Dorian 2019 and it went through (spinup ran was successful)

SorooshMani-NOAA commented 1 year ago

@saeed-moghimi-noaa When trying Dorian 2019 with the workflow, I noticed an issue. In the workflow when deciding when to start perturbing the track, I need to calculate a rough estimate of landfall time, so I take the shapefile of US and intersect it with the track. In case of Dorian best track, the track doesn't seem to intersect the US shape at all! so I was wondering if you have any suggestion for how to improve the logic?

One way is to just say perturb before the landfall on any country; but the reason I didn't do that is that sometimes some storm in the gulf might landfall on a country and then again on US coast, and we'd like to perturb before US landfall. Please let me know what you think.

SorooshMani-NOAA commented 1 year ago

@FariborzDaneshvar-NOAA I added a fix for this for now. Both best track and official track should work for all storms (including Ian and Dorian) please let me know if you notice any issues.

saeed-moghimi-noaa commented 1 year ago

Hi @SorooshMani-NOAA Please discuss about this with our friends from NHC. Perhaps they have a specific way of handling this. Thanks.

SorooshMani-NOAA commented 1 year ago

I asked this question in the NHC meeting and they said that they use a subjective approach. In cases where there is no actual landfall (Marco 2020, Dorian 2019, ...) the take the point of closes approach as the landfall and then calculate the perturbation location. For the 25 storms to test for skill assessment (https://github.com/saeed-moghimi-noaa/Next-generation-psurge-tasks/issues/14) there's a fixed table that is used for lead times.

We can take this table and put it in our workflow when we run these storms.

FariborzDaneshvar-NOAA commented 1 year ago

@SorooshMani-NOAA Thanks for the fix. New runs of Ian and Dorian with the OFCL track completed successfully. There was only a memory issue during the post-processing step for Dorian that I documented it here: Memory issue for combining results #111

SorooshMani-NOAA commented 1 year ago

Since the spinup issues are resolved, I close this ticket.