jhardenberg / PLASIM

General Circulation Models Planet Simulator (PlaSim) with some improvements
Other
7 stars 4 forks source link

PLASIM fails with T31 resolution and parallel mode #10

Open ValerioLembo opened 2 years ago

ValerioLembo commented 2 years ago

Plasim is unstable after a few years when running in parallel mode (possibly even in sequential mode).

The atmospheric diagnostics seem ok, but LSG does not seem to print out anything, as it can be seen in the *.DIAG containing in the Dropbox folder below.

"Classic" instabiity arises, already in the first simulated year. Run goes on but nothing meaningful is written (see Dropbox folder);

Link to PLASIM folder containing namelist, bash scripts and diagnostics

jhardenberg commented 2 years ago

Hi, this looks serious and I will give a look. Did you try in sequential mode to confirm if it is a general T31 issue ?

ValerioLembo commented 2 years ago

@jhardenberg I have just finished a 10yr run in sequential mode. So far, it does not seem to have any issue...

ValerioLembo commented 1 year ago

Hi Jost,

I was wondering if you managed to have a look at this issue with T31 resolution. Apparently, the new compiler at G100 (CINECA) does not seem to improve the situation.

I am trying to bring up again that idea of the multiresolution step forcing experiments and T31 is key in bridging the gap between the reasonably stable T21 and T42 resolutions.

I played around with the MPSTEP but it is not of much help. I would like to play around with the ocean diffusion, but I did not find any info about how to do that...

jhardenberg commented 1 year ago

Where do thos " WARNING: abs(zeta) > du(1) detected at " messages in the log come from?

jhardenberg commented 1 year ago

So basically SSH becomes larger than the first ocean layer ... hmmm

jhardenberg commented 1 year ago

Can we somehow really exclude that this problem appears also in serial then? So it is something which only occurs in parallel ? Is a small problema ccumulating over time or does it explode 'suddenly' ?

ValerioLembo commented 1 year ago

Hi Jost,

from what I can see, it does not seem to accumulate some sort of problem and then exploding. It looks as if it suddendly crashes because of some numerical error.

Not sure we can rule out that the sequential run might crash at some point. I ran it for 10 years and it was ok, but it took a while before completing the task, that is why I did not proceed further.

Here is a new DIAG file that shows the model crashing a few months after the start of the run.

The run was compiled with NOCEAN=1, NLSG=1 and NICE=1 on CINECA's G100 machine with 8 processors using gfortran.

most_test_t31_p8_DIAG.0001.txt