nasa / cape

Computational Aerosciences Productivity & Execution
Other
22 stars 9 forks source link

pycart continually restarting case #30

Open khalilsb opened 9 months ago

khalilsb commented 9 months ago

When running cart3d with several adaptions if the final iteration number is not exactly reached pycart re-runs aero.csh with a restart, which starts the case from adapt00 again. this seems to corrupt the data and causes pycart to get stuck in a continually restart loop

I have specified restart false in the RunControl section

"RunControl": {
    "MPI":false,
    "PhaseSequence": [0],
    "PhaseIters": [1600],
    "sbatch": true,
    "PBS":false,
    "Resubmit": false,
    "Continue": false,
    "nProc":96,
    "Adaptive": true,
    "Verbose": true,
    "intersect": {
        "run": true,
        "triged": false
    },

Is there a way to force pycart to not submit restarts?

nasa-ddalle commented 8 months ago

I think I might need some more details to figure out more precisely what's going on. Which version of CAPE are you running? There might be some changes that affect this.

Are you having aero.csh exit some of the cycles early if a certain residual target is met? In that case I think you'll need to decrease PhaseIters. I understand some undesirable consequences of doing that, but we'll need to come up with a new design to capture these cases if early-exit is the main cause. It shouldn't lead to cases being prematurely declared DONE by pycart because it's also looking for a file called run.01.NNNN.

Let me know if I'm reading this correctly, b/c I think I have an idea how we could upgrade CAPE to support cases like this.

khalilsb commented 8 months ago

this is on cape v1.0.0, yes aero.csh is exiting if some residual or cell count is met. so there is a run.00.NNN but cape tries to still resubmit if the PhasIters are not met.

Sounds like the solution is just to decrease the PhasIters? But i would have thought cape would pick up on the run.00.NNN