Check flex to normal queue for frontera - Githubissues

oceanmodeling / ufs-weather-model

This repo is forked from ufs-weather-model, and contains the model code and external links needed to build the UFS coastal model executable and model components, including the ROMS, FVCOM, ADCIRC and SCHISM plus WaveWatch III model components.

https://github.com/oceanmodeling/ufs-coastal-app

Other

2 stars 3 forks source link

Check flex to normal queue for frontera #64

Closed janahaddad closed 1 month ago

janahaddad commented 2 months ago

Great, thanks for checking. @uturuncoglu We might need to switch from flex to normal queue for frontera (or something it works for the users, need to check on that). Also for "hercules" we need to switch from "windfall" to the "batch" qos.

Originally posted by @pvelissariou1 in https://github.com/oceanmodeling/ufs-coastal/discussions/46#discussioncomment-9014573

pvelissariou1 commented 2 months ago

@janahaddad , @ufuk let's have this discussion in Monday's meeting

janahaddad commented 2 months ago

in parallel with #59

uturuncoglu commented 2 months ago

@pvelissariou1 @janahaddad After sync, now Hercules is using batch queue. So, there is no need to update rt.sh. I'll test Frontera queue.

uturuncoglu commented 2 months ago

It seems that normal queue requires at least 3 nodes. There are new queue which is called as small. This can be used for RT runs but I am not how different from flex. Anyway, let's keep flex at this point and if you see small is better. We could switch to that one.

uturuncoglu commented 2 months ago

Maybe development could be an option.

pvelissariou1 commented 2 months ago

@uturuncoglu development waltime is 30min only. Running the atlantic RTs might require more time (especially with WW3 coupling). I'll check and report back.

uturuncoglu commented 2 months ago

@pvelissariou1 As I know the RT system is just for small jobs that runs quickly to test the capability. If we want to run realistic high resolution cases, that could be part of testing in the application level. BTW, development queue has fast turn around.

pvelissariou1 commented 2 months ago

@uturuncoglu I understand this. From our side we need to run/test the large cases maybe not on Frontera but in ther HPC platforms. Let's switch to develop (as the default) on Frontera, if the user wants he/she can change to some other queue. Since you are comfortable doing this from the ufs-weather-model side you might want to submit a PR for this?

uturuncoglu commented 2 months ago

@pvelissariou1 Yes. That is totally understandable. We could have another level of testing in the application side to tests those large scale realistic cases. I am not sure how this is handled with HAFS. I think they have some level of testing with workflow: https://hafs.readthedocs.io/en/latest/RegressionTest.html. @janahaddad We might contact with HAFS team to learn about their experience about testing besides model level RTs.

uturuncoglu commented 2 months ago

@pvelissariou1 @janahaddad JFYI, I switched to development queue on Frontera and fixed the issues.

pvelissariou1 commented 2 months ago

@uturuncoglu Thanks