Convergence script with linear diffusion setup

zhangshixuan1987 commented 1 year ago

@vlarson @bmg929:

Hi Vince and Brian, this commits contains the updated scripts to run the convergence test simulations following the same setups in the convergence paper (https://doi.org/10.22541/essoar.167632252.26895646/v1). Here I created three scripts to run the three key setups that we investigated:

run_scripts/convergence_run/run_cnvg_test_multi_cases_default.csh: using all default setups in CLUBB master branch and run the convergence test simulations.
run_scripts/convergence_run/run_cnvg_test_multi_cases_baseline.csh: using all default setups except with the revised initial and boundary conditions for the four cases (BOMEX, DYCOMS_RF02_ND, RICO and Wangara)
run_scripts/convergence_run/run_cnvg_test_multi_cases_revall.csh: using all revised setups investigated in the convergence paper and run the convergence test simulations.

The run_scripts/convergence_run/convergence_config.py is also revised to configure the CLUBB with all revised setups, including the setup for linear diffusion that we tested in the convergence paper.

Here, I provide some of the test results on my side (default setup versus revised setup that includes all changes we made in the convergence paper ) as a reference for you and @vlarson:

The convergence of four cases with default and revised configurations: convergence_default_vs_revised.pdf
Surface fluxes of RICO case with default and revised configu surface_flux_defalut_vs_revised.pdf rations:
Responses of the BOMEX and RICO case to the changes in the limiters for Brunt–Väisälä frequency (BVF) and Richardson number: bvf_limiter_default_vs_revised.pdf

Note that: it seems that the current CLUBB code becomes expensive to run the simulations by refining the grid with 2^7 (I can not finish the simulation with 48 hours wall time on Compy). Therefore, I used the simulations with refinement of 2^6 as a reference to draw the convergence plots.

Overall, I think that my test results suggest that the results from the new code are pretty consistent with the results in our convergence paper (which uses an older code branch)

bmg929 commented 1 year ago

How long does it typically take to run the convergence tests?

I committed the updates to the scripts to the CLUBB repository and then altered a few lines locally so I could try to run them on a local, UWM machine. The script seemed to take a very long time to run (and, in fact, the run didn't finish because I lost the connection). Is this normal? The following is the output to the screen:

griffinb@carson:~/clubb_merge_conv_code/run_scripts/convergence_run$ csh run_cnvg_test_multi_cases_baseline.csh
convergence simulation start
Mon Apr 17 12:08:34 PM CDT 2023

Running simulaitons for bomex : tstart = 0, tend = 21600
[1] 1280426
Mon Apr 17 12:08:34 PM CDT 2023

real    0m8.118s
user    0m8.350s
sys     0m2.329s

real    0m27.296s
user    0m27.502s
sys     0m2.390s

real    1m49.610s
user    1m46.294s
sys     0m5.856s
Running simulaitons for rico : tstart = 0, tend = 21600
[2] 1280849
Mon Apr 17 12:13:34 PM CDT 2023

real    0m14.991s
user    0m15.217s
sys     0m1.966s

real    0m56.892s
user    0m54.991s
sys     0m4.124s

real    7m53.545s
user    7m4.226s
sys     0m51.838s

real    4m5.942s
user    3m38.272s
sys     0m29.786s
Running simulaitons for dycoms2_rf02_nd : tstart = 0, tend = 21600
[3] 1281200
Mon Apr 17 12:18:34 PM CDT 2023

real    0m6.396s
user    0m6.587s
sys     0m1.701s

real    0m19.924s
user    0m20.095s
sys     0m1.711s

real    1m16.032s
user    1m12.027s
sys     0m5.876s

Running simulaitons for wangara : tstart = 82800, tend = 104400
[4] 1281572
Mon Apr 17 12:23:34 PM CDT 2023

real    0m7.585s
user    0m7.648s
sys     0m1.382s

real    0m24.822s
user    0m24.884s
sys     0m1.433s

real    5m12.411s
user    4m38.426s
sys     0m35.756s

real    1m37.691s
user    1m32.891s
sys     0m6.255s

real    7m0.377s
user    6m6.458s
sys     0m55.271s

real    17m8.703s
user    14m49.639s
sys     2m21.107s

real    22m1.144s
user    19m1.054s
sys     3m1.682s

real    34m23.473s
user    29m47.359s
sys     4m38.414s

real    28m18.481s
user    24m29.588s
sys     3m50.308s

real    69m50.940s
user    60m26.703s
sys     9m26.090s

real    87m19.101s
user    75m42.186s
sys     11m38.275s

real    110m46.923s
user    96m21.185s
sys     14m26.806s

real    138m8.185s
user    120m22.975s
sys     17m47.138s
client_loop: send disconnect: Connection reset by peer

zhangshixuan1987 commented 1 year ago

@bmg929 : Hi Brian, the convergence test simulations would refine the vertical resolution. The highest refinement is 2^7 which is 128 times smaller than the default resolution. For the RICO case, the model top is 10 km, so the highest resolution's vertical levels are 7297 (the smallest grid spacing is about 0.2m ). This will indeed take some time to finish.

For the convergence test simulations I conducted:

For the old version of the code we used for the CLUBB convergence paper: the convergence test simulations with the highest resolution (refine the default grid by a factor of 2^7) will take about 20-24 hours for RICO case as this case use a model height of 10km which will have much more vertical levels than other cases. For other cases, the simulation would finish within 8hour wall time. For all of these simulations, I set the output frequency as 600s.
For the new version of code from the master branch: the same high-resolution simulation for RICO case was not finished within 48 hours walltime. I checked my simulation setup and found that I set the output frequency to 60s, so I think this should be one reason for the increased computational cost. For your simulation, I think it would be enough to use 600s output frequency for the convergence test.

For my two tests above, I used only one node for the simulations. I guess parallel jobs could also help reduce the computational cost, but I did not try it on my side.

vlarson commented 1 year ago

Instead of running RICO, maybe for a first test it is quicker to run just the BOMEX case.

bmg929 commented 1 year ago

I am now attempting to run the convergence tests on Anvil. I have copied the linux_x86_64_ifort_compy.bash script to make a version for anvil and changed a couple paths and settings. The code compiles almost all the way through, but fails at the very end:

ld: cannot find -lmkl_intel_lp64
ld: cannot find -lmkl_sequential
ld: cannot find -lmkl_core
make[1]: *** [/home/ac.griffin/clubb_convergence_test/compile/../bin/clubb_standalone] Error 1
make[1]: Leaving directory `/gpfs/fs1/home/ac.griffin/clubb_convergence_test/obj'
make: *** [clubb_standalone] Error 2

Within the compiler script, there is the line:

CPPFLAGS="-I$MKLPATH/../../include -I$NETCDF/include"

where MKLPATH is the only variable in the script that is not defined somewhere locally within the script. However, it is not defined within my environment either. I am assuming that this is the origin of my error.

Would you happen to know how I might go about providing the right setting for MKLPATH or where I might begin to look? Thank you.

bmg929 commented 1 year ago

Would you happen to know how I might go about providing the right setting for MKLPATH or where I might begin to look? Thank you.

Upon further inspection of env, there is a MKLROOT environmental variable. I altered the script to use MKLROOT instead of MKLPATH. However, the compilation still fails with the same error message.

Edit:

I needed change the include path from:

CPPFLAGS="-I$MKLPATH/../../include -I$NETCDF/include"

to

CPPFLAGS="-I$MKLROOT/include -I$NETCDF/include"

However, once again, it still fails with the same error message.

zhangshixuan1987 commented 1 year ago

@bmg929: Hi Brian, I happened to have an Anvil account, and I just attempt to compile the CLUBB there successfully. I attached the environment file for your reference (I add .txt in the file name as github does not allow the upload of .bash file):

linux_x86_64_ifort_anvil.bash.txt

bmg929 commented 1 year ago

The path to the libraries that it is looking for when it fails appears to be: /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin or more simply $MKLROOT/lib/intel64_lin.

However, I don't know where to enter this information in the script (or otherwise) to ensure it can find the libraries that it is looking for.

bmg929 commented 1 year ago

@bmg929: Hi Brian, I happened to have an Anvil account, and I just attempt to compile the CLUBB there successfully. I attached the environment file for your reference (I add .txt in the file name as github does not allow the upload of .bash file):

linux_x86_64_ifort_anvil.bash.txt

Thank you Shixuan!

bmg929 commented 1 year ago

When I try to compile with the new script, I get the following error messages:

[ac.griffin@blueslogin2 compile]$ ./compile.bash -c config/linux_x86_64_ifort_anvil.bash
Lmod has detected the following error:  These module(s) exist but
cannot be loaded as requested: "python"
   Try: "module spider python" to see how to load the module(s).

Lmod has detected the following error:  Cannot load module
"netcdf-fortran/4.5.3" without these module(s) loaded:
   intel-parallel-studio/cluster.2020.2-xz35pbn anaconda3/2020.07
gcc/9.2.0-pkmzczt

While processing the following module(s):
    Module fullname       Module Filename
    ---------------       ---------------
    netcdf-fortran/4.5.3  /soft/bebop/modulefiles/netcdf-fortran/4.5.3.lua

Lmod has detected the following error:  Cannot load module
"netcdf-c/4.7.4" without these module(s) loaded:
   intel-parallel-studio/cluster.2020.2-xz35pbn anaconda3/2020.07
gcc/9.2.0-pkmzczt

While processing the following module(s):
    Module fullname  Module Filename
    ---------------  ---------------
    netcdf-c/4.7.4   /soft/bebop/modulefiles/netcdf-c/4.7.4.lua

Lmod has detected the following error:  The following module(s) are
unknown: "intel-mkl/2019.5.281"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore-cache load "intel-mkl/2019.5.281"

Also make sure that all modulefiles written in TCL start with the string #%Module

bmg929 commented 1 year ago

When I make the following changes to the script, I can get it to compile on anvil for me:

[ac.griffin@blueslogin2 bin]$ git diff
diff --git a/compile/config/linux_x86_64_ifort_anvil.bash b/compile/config/linux_x86_64_ifort_anvil.bash
index 74114a3..039b450 100644
--- a/compile/config/linux_x86_64_ifort_anvil.bash
+++ b/compile/config/linux_x86_64_ifort_anvil.bash
@@ -6,6 +6,9 @@
 module purge
 module load python
 module load intel
+module load intel-parallel-studio/cluster.2020.2-xz35pbn
+module load anaconda3/2020.07
+module load gcc/9.2.0-pkmzczt
 module load netcdf-fortran/4.5.3
 module load netcdf-c/4.7.4
 module load intel-mkl/2019.5.281

zhangshixuan1987 commented 1 year ago

When I make the following changes to the script, I can get it to compile on anvil for me:

[ac.griffin@blueslogin2 bin]$ git diff
diff --git a/compile/config/linux_x86_64_ifort_anvil.bash b/compile/config/linux_x86_64_ifort_anvil.bash
index 74114a3..039b450 100644
--- a/compile/config/linux_x86_64_ifort_anvil.bash
+++ b/compile/config/linux_x86_64_ifort_anvil.bash
@@ -6,6 +6,9 @@
 module purge
 module load python
 module load intel
+module load intel-parallel-studio/cluster.2020.2-xz35pbn
+module load anaconda3/2020.07
+module load gcc/9.2.0-pkmzczt
 module load netcdf-fortran/4.5.3
 module load netcdf-c/4.7.4
 module load intel-mkl/2019.5.281

@bmg929 : Great that you find a way on your side. The script I shared works on my side. The setup of compiling environment always puzzles me.

bmg929 commented 1 year ago

There appears to be an issue when I run run_cnvg_test_multi_cases_baseline.csh in regards to its call to run convergence_config.py in these lines:

#!/bin/bash
date
echo
EOB
    set k = 1
    while ( $k <= $nrefs )
      set jobid  = `printf "%02d" $k`
      set config = "-dt $time_steps[$k] -ref $refine_levels[$k] -ti ${tstart} -tf ${tend} -dto ${dt_output}"
      set strs0  = 'time python3 '"${topdir}"'/run_scripts/convergence_run/convergence_config.py $1 -output-name $2'
      if( $k < $nrefs) then
        set strs1  = '-skip-check ${@:3} > ${1}_${2}_${SLURM_JOBID}-'"${jobid}"'.log 2>&1 &'
      else
        set strs1  = '-skip-check ${@:3} > ${1}_${2}_${SLURM_JOBID}-'"${jobid}"'.log 2>&1 '
      endif
      echo ""  >> ${run_script}
      echo "${strs0}  ${config} ${config_flags}  ${strs1}" >> ${run_script}
      echo "sleep 20" >> ${run_script}
      @ k++
    end

There is no output from running this script other than in the *.log files, which all contain the following error message:

Traceback (most recent call last):
  File "/home/ac.griffin/clubb_convergence_test/run_scripts/convergence_run/convergence_config.py", line 13, in <module>
    import numpy as np
ModuleNotFoundError: No module named 'numpy'

zhangshixuan1987 commented 1 year ago

@bmg929： Hi Brian, I think the errors are due to the Python libraries. In the script, there are three steps:

compile the CLUBB model,
run the convergence test, and
Plot the sample convergence figures.

In the third step, we use a Python script to process the data and generate the figures. The Numpy is needed to calculate error metrics such as root-mean-square errors. It seems to me that your model complains because Numpy is not installed in your Python environment.

A simple solution is to comment out "import numpy as np" and also the plotting section in run_cnvg_test_multi_cases_baseline.csh. Otherwise, we need to provide a Python environment that includes Numpy.

I can work on the Anvil and find a solution since you are using the CLUBB there, However, I would like to ask you here if this is what you want?

Thank you!

Shixuan

bmg929 commented 1 year ago

Thanks Shixuan! I will try to troubleshoot it and see if I can get it to work on Anvil. If I can't, then I might need a little help. My goal is to run it on Anvil (which I have access to) so that we can intermittently check future versions of CLUBB to see if they still converge.

bmg929 commented 1 year ago

In the third step, we use a Python script to process the data and generate the figures. The Numpy is needed to calculate error metrics such as root-mean-square errors. It seems to me that your model complains because Numpy is not installed in your Python environment.

A simple solution is to comment out "import numpy as np" and also the plotting section in run_cnvg_test_multi_cases_baseline.csh. Otherwise, we need to provide a Python environment that includes Numpy.

The problem first occurs on the second step, which is the running of the convergence test. The running of the tests generates a bash script for running each case, and these bash scripts contain a command to call the python convergence_config.py file. This script, in turn, calls the function modify_ic_profile, which is found in the python convergence_function.py. I commented out import numpy as np in both python files. However, as it turns out, both python scripts reference np on multiple lines of code, so it appears that it is not so simple as to remove the importation of numpy.

However, UWM has a battery of python scripts that were written by Zhun Guo, all which contain import numpy as np. These scripts have been successfully run on Anvil. Therefore, there must currently be a way to load a python environment that includes numpy on Anvil. All I should have to do is follow the same steps as I do before I run Zhun Guo's scripts.

Futhermore, some of our postprocessing scripts for running E3SM diagnostics make use of e3sm_diags_env.yml, which includes a dependency on numpy. We usually load it as follows:

source /lcrc/soft/climate/e3sm-unified/base/etc/profile.d/conda.sh
conda activate e3sm_diags_env
python run_e3sm_diags.py

After testing, this appears to be working so far. I issue the command source /lcrc/soft/climate/e3sm-unified/base/etc/profile.d/conda.sh, followed by the command conda activate e3sm_diags_env. Then ...

(e3sm_diags_env) [ac.griffin@blueslogin3 convergence_run]$ csh run_cnvg_test_multi_cases_baseline.csh

When I do it this way, the cnvg_baseline directory fills up with files and all the *.log files show no errors. So far, so good!

zhangshixuan1987 commented 1 year ago

@bmg929 : Hi Brian, thank you for providing the details and the solution. I forgot that I used the Python script to construct the initial condition profile in-the-fly for the Dycoms_RF02 case and modifications of the sounding profile for other cases (in the step 2 as I mentioned and as you mentioned above). The purpose here is to provide a fixed and smoothed initial condition profile so that all refinement simulations used the same initial condition profile for initialization. Also, we need to construct some fake layers (will missing values) to ensure that the cubic spline interpolation reproduces the sounding profile as much as possible (when the sounding is too coarse, the cubic spline interpolation can generate some small features that are purely from the interpolation). The Numpy is called to do such task. I think that using the existing e3sm_diags_env.yml is a better idea as it will avoid the issues of the changes in the environment.

bmg929 commented 1 year ago

There are further issues in step 3, which is the postprocessing step:

Traceback (most recent call last):
  File "bomex_fig.py", line 3, in <module>
    from netCDF4 import Dataset
ModuleNotFoundError: No module named 'netCDF4'
removed ‘bomex_fig.py’
Traceback (most recent call last):
  File "rico_fig.py", line 3, in <module>
    from netCDF4 import Dataset
ModuleNotFoundError: No module named 'netCDF4'
removed ‘rico_fig.py’
Traceback (most recent call last):
  File "dycoms2_rf02_nd_fig.py", line 3, in <module>
    from netCDF4 import Dataset
ModuleNotFoundError: No module named 'netCDF4'
removed ‘dycoms2_rf02_nd_fig.py’
Traceback (most recent call last):
  File "wangara_fig.py", line 3, in <module>
    from netCDF4 import Dataset
ModuleNotFoundError: No module named 'netCDF4'
removed ‘wangara_fig.py’
Mon Apr 24 10:54:32 CDT 2023

I can get around the netCDF4 issue by using a different conda environment. In order to use Zhun Guo's E3SM-CLUBB diagnostic budget, he had as create conda environments that are loaded by conda activate <USERNAME>. Instructions are found here: https://github.com/larson-group/E3SM/wiki/Zhun's-guide-to-running-the-single-column-and-global-versions-of-the-E3SM-model#32-plotting-results-how-to-use-clubbs-budget-diagnostic-package This environment includes Numpy and netCDF4.

However, even after using that instead of e3sm_diags_env, there is still another error:

Traceback (most recent call last):
  File "bomex_fig.py", line 10, in <module>
    import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'
removed ‘bomex_fig.py’
Traceback (most recent call last):
  File "rico_fig.py", line 10, in <module>
    import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'
removed ‘rico_fig.py’
Traceback (most recent call last):
  File "dycoms2_rf02_nd_fig.py", line 10, in <module>
    import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'
removed ‘dycoms2_rf02_nd_fig.py’
Traceback (most recent call last):
  File "wangara_fig.py", line 10, in <module>
    import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'
removed ‘wangara_fig.py’
Mon Apr 24 11:30:31 CDT 2023

Postprocessing requires something called "seaborn".

bmg929 commented 1 year ago

Perhaps I could try to further alter the custom environments (as described in the link in the previous comment) to contain all missing pieces of the python environment. Otherwise, I could also copy e3sm_diags_env.yml, rename it, and then try to add some more lines to include the missing pieces.

bmg929 commented 1 year ago

Good news on this front -- all the required packages (numpy, netcdf4, and seaborn) are made available simply by loading the following:

 source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh

I can then run using:

(e3sm_unified_1.8.0_nompi) [ac.griffin@blueslogin2 convergence_run]$ csh run_cnvg_test_multi_cases_baseline.csh

and avoid all the python errors from some package not being found.

bmg929 commented 1 year ago

We now come to the next issue -- no output is being produced.

zhangshixuan1987 commented 1 year ago

@bmg929: Could you point me which machine are you using? It seems to me that "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh" seems to be generated for the Argon Chrysalis machine rather than Anvil machine as you mentioned in previous comments above. I have accounts for both machines, and if you can tell me which machine you are using, then I can do a quick test on my side, see if I encountered the same issues, and provide some information to you from my investigation.

bmg929 commented 1 year ago

@bmg929: Could you point me which machine are you using? It seems to me that "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh" seems to be generated for the Argon Chrysalis machine rather than Anvil machine as you mentioned in previous comments above. I have accounts for both machines, and if you can tell me which machine you are using, then I can do a quick test on my side, see if I encountered the same issues, and provide some information to you from my investigation.

I am using Anvil; however, I know that the same home directory and file system is used for both machines, and the activation of the E3SM unified environment appears to be the same for both machines (LCRC), judging by these instructions: https://e3sm-project.github.io/e3sm_diags/_build/html/main/install.html#activate-e3sm-unified-environment

bmg929 commented 1 year ago

While CLUBB successfully compiles (and the clubb_standalone executable successfully appears within the bin directory), a simple test of run_scm.bash shows that CLUBB isn't running. Here is the error message:

(e3sm_unified_1.8.0_nompi) [ac.griffin@blueslogin2 run_scripts]$ ./run_scm.bash bomex
Running bomex
../bin/clubb_standalone: error while loading shared libraries: libnetcdff.so.7: cannot open shared object file: No such file or directory

bmg929 commented 1 year ago

While CLUBB successfully compiles (and the clubb_standalone executable successfully appears within the bin directory), a simple test of run_scm.bash shows that CLUBB isn't running. Here is the error message:
(e3sm_unified_1.8.0_nompi) [ac.griffin@blueslogin2 run_scripts]$ ./run_scm.bash bomex
Running bomex
../bin/clubb_standalone: error while loading shared libraries: libnetcdff.so.7: cannot open shared object file: No such file or directory

The compiler script config/linux_x86_64_ifort_anvil.bash sets the following line:

# == NetCDF Location ==
NETCDF=$NETCDF_ROOT

However, $NETCDF_ROOT is not defined anywhere, neither in the script nor in my environmental variables. Printing the value of $NETCDF_ROOT simply yields a blank line.

bmg929 commented 1 year ago

In the e3sm_unified_1.8.0_nompi environment that I've loaded and am running within, the path to libnetcdff.so.7 appears to be the following: /lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.8.0_nompi/lib.

Edit:

Nevermind, when I set NETCDF to this path, the code won't compile owing to NetCDF being compiled with a different compiler:

/home/ac.griffin/clubb_convergence_test/compile/../src/CLUBB_core/output_netcdf.F90(42): error #7013: This module file was not generated by any release of this compiler.   [NETCDF]
    use netcdf, only: &

Phooey

These are the modules that are loaded in the compiler script:

module load netcdf-fortran/4.5.3
module load netcdf-c/4.7.4

so I'm going to have to find the path that goes along with these.

zhangshixuan1987 commented 1 year ago

@bmg929 : Hi Brian, I think we have a misunderstanding that causes my suggestions above less useful. I thought the Anvil machine you mentioned referred to the Anvil at Purdue University (https://www.rcac.purdue.edu/knowledge/anvil/access). As I have this "Anvil" account, and I tested and run the CLUBB there and provided the compiling environment script to you. However, it turned out that this is not the case.

It turned out that you are referring to the Argon Anvil/Chrysalis machine. I have tested and figured out the compile environment setups as well as the changes in the run scripts: Compiling scripts:

Revised run scripts

Default setup: run_cnvg_test_multi_cases_default.csh.txt
Baseline setup: run_cnvg_test_multi_cases_baseline.csh.txt
All changes with improved convergence: run_cnvg_test_multi_cases_revall.csh.txt

Note: remember to remove .txt to convert the file to shell scripts. Also, I only run a quick test and killed the jobs once I saw the model output is generated. Please let me know if other issues arise.

bmg929 commented 1 year ago

@bmg929 : Hi Brian, I think we have a misunderstanding that causes my suggestions above less useful. I thought the Anvil machine you mentioned referred to the Anvil at Purdue University (https://www.rcac.purdue.edu/knowledge/anvil/access). As I have this "Anvil" account, and I tested and run the CLUBB there and provided the compiling environment script to you. However, it turned out that this is not the case.

It turned out that you are referring to the Argon Anvil/Chrysalis machine. I have tested and figured out the compile environment setups as well as the changes in the run scripts: Compiling scripts:

compile.bash.txt

linux_x86_64_ifort_anvil.bash.txt

Revised run scripts

Default setup: run_cnvg_test_multi_cases_default.csh.txt

Baseline setup: run_cnvg_test_multi_cases_baseline.csh.txt

All changes with improved convergence: run_cnvg_test_multi_cases_revall.csh.txt

Note: remember to remove .txt to convert the file to shell scripts. Also, I only run a quick test and killed the jobs once I saw the model output is generated. Please let me know if other issues arise.

Thank you very much for this, Shixuan! The code compiled successfully and run_cnvg_test_multi_cases_baseline.csh is currently running and producing output on Anvil!

zhangshixuan1987 commented 1 year ago

Great news! Please let me know if there are any questions after you finish the simulation and obtain the results.

bmg929 commented 1 year ago

Update: It looks like it ran, but I noticed an error in the postprocessing section. However, upon further inspection, it was a "Disk Quota Exceeded" error message. I am currently rerunning.

bmg929 commented 1 year ago

I ran the "baseline" convergence tests to a successful completion.

convergence_bomex_l2_cnvg_baseline_thlm convergence_dycoms2_rf02_nd_l2_cnvg_baseline_thlm convergence_rico_l2_cnvg_baseline_thlm convergence_wangara_l2_cnvg_baseline_thlm

The above are the thlm convergence plots from each of the 4 cases in the "baseline" run.

There are plenty of other fields to look at as well ... not sure what fields are the most relevant to look at.

zhangshixuan1987 commented 1 year ago

@bmg929 : Hi Brian, thank you for updating. I should point out that the figures I uploaded here is a comparison of "Default" and "Revised" configuration. The default is the simulations with all default setup in CLUBB, while the revised configuration refers to the simulation will all changes related to the convergence paper.

The baseline configuration as you shown above is an in-between configuration: i.e. Default + revised initial and boundary conditions only. Therefore, you will see that the convergence looks different from the figures I showed. I think it would be useful if you have run simulations that with "Revised" configuration and see if you can still obtain the first-order convergence as in the figures I showed.

As for your question: during our convergence paper work, we selected "thlm" and "wp3" as two key variables and check the convergence for these two variables first. Then the "wp2", "wpthlp" "um" and "upwp". If the convergence for these variables look reasonable, most of the other variables would show reasonably good convergence as well. However, this is empirical.

In addition, when we are trying to diagnose convergence issues, we will check all convergence figures, and select the variable which has earliest divergence (i.e. convergence rate reduces to below 1) as the start point to understand the reason for the degraded convergence.

bmg929 commented 1 year ago

Edit: I am now showing the both the "default" and "revall" runs so that you can see the difference side by side:

thlm:

BOMEX default: convergence_bomex_l2_cnvg_default_thlm BOMEX revall: convergence_bomex_l2_cnvg_revall_thlm RF02 default: convergence_dycoms2_rf02_nd_l2_cnvg_default_wp3 RF02 revall: RICO default: convergence_rico_l2_cnvg_default_thlm RICO revall: convergence_rico_l2_cnvg_revall_thlm Wangara default: convergence_wangara_l2_cnvg_default_thlm Wangara revall: convergence_wangara_l2_cnvg_revall_thlm

wp3:

BOMEX default: convergence_bomex_l2_cnvg_default_wp3 BOMEX revall: convergence_bomex_l2_cnvg_revall_wp3 RF02 default: convergence_dycoms2_rf02_nd_l2_cnvg_default_wp3 RF02 revall: RICO default: convergence_rico_l2_cnvg_default_wp3 RICO revall: convergence_rico_l2_cnvg_revall_wp3 Wangara default: convergence_wangara_l2_cnvg_default_wp3 Wangara revall: convergence_wangara_l2_cnvg_revall_wp3

zhangshixuan1987 commented 1 year ago

@bmg929 : the results above seem to be consistent with what I got from my test simulations. The only difference is wp3 in Wangara case at hour 4, but I think this maybe not an issue given that the master branch is different from the code for the CLUBB convergence paper. Overall, I think that the results here are still consistent with what we got in the CLUBB convergence paper.

@vlarson: Do you think that the results here are good enough or are consisten with the results in our convergence paper?

bmg929 commented 1 year ago

I created the following document to add more fields to the analysis: CLUBB_convergence_20230504.pdf

In all comparisons, the "default" is on the left and the "revised" is on the right.

Fields thlm, wp3, wp2, wpthlp, um, and upwp are all included.

Pages 1-2 are BOMEX, pages 3-4 are DYCOMS-II RF02 ND, pages 5-6 are RICO, and pages 7-8 are Wangara.

vlarson commented 1 year ago

@bmg929 : the results above seem to be consistent with what I got from my test simulations. The only difference is wp3 in Wangara case at hour 4, but I think this maybe not an issue given that the master branch is different from the code for the CLUBB convergence paper. Overall, I think that the results here are still consistent with what we got in the CLUBB convergence paper.

@vlarson: Do you think that the results here are good enough or are consisten with the results in our convergence paper?

To me, the results look convergent. But I'll forward Brian's plots to Chris Vogl in order to see what he thinks.

vlarson commented 1 year ago

@bmg929 : Hi Brian, thank you for updating. I should point out that the figures I uploaded here is a comparison of "Default" and "Revised" configuration. The default is the simulations with all default setup in CLUBB, while the revised configuration refers to the simulation will all changes related to the convergence paper.

The baseline configuration as you shown above is an in-between configuration: i.e. Default + revised initial and boundary conditions only. Therefore, you will see that the convergence looks different from the figures I showed. I think it would be useful if you have run simulations that with "Revised" configuration and see if you can still obtain the first-order convergence as in the figures I showed.

@bmg929, do we have a simple script in the larson-group/clubb repo that can run the "Revised" configuration with the push of a button? If so, what is it? If not, can you please create one and commit it?

[x] In the CLUBB README file, document the procedure for running the revall convergence script on Anvil.

larson-group / clubb_release

Convergence script with linear diffusion setup #14