Closed zhangshixuan1987 closed 1 year ago
How long does it typically take to run the convergence tests?
I committed the updates to the scripts to the CLUBB repository and then altered a few lines locally so I could try to run them on a local, UWM machine. The script seemed to take a very long time to run (and, in fact, the run didn't finish because I lost the connection). Is this normal? The following is the output to the screen:
griffinb@carson:~/clubb_merge_conv_code/run_scripts/convergence_run$ csh run_cnvg_test_multi_cases_baseline.csh
convergence simulation start
Mon Apr 17 12:08:34 PM CDT 2023
Running simulaitons for bomex : tstart = 0, tend = 21600
[1] 1280426
Mon Apr 17 12:08:34 PM CDT 2023
real 0m8.118s
user 0m8.350s
sys 0m2.329s
real 0m27.296s
user 0m27.502s
sys 0m2.390s
real 1m49.610s
user 1m46.294s
sys 0m5.856s
Running simulaitons for rico : tstart = 0, tend = 21600
[2] 1280849
Mon Apr 17 12:13:34 PM CDT 2023
real 0m14.991s
user 0m15.217s
sys 0m1.966s
real 0m56.892s
user 0m54.991s
sys 0m4.124s
real 7m53.545s
user 7m4.226s
sys 0m51.838s
real 4m5.942s
user 3m38.272s
sys 0m29.786s
Running simulaitons for dycoms2_rf02_nd : tstart = 0, tend = 21600
[3] 1281200
Mon Apr 17 12:18:34 PM CDT 2023
real 0m6.396s
user 0m6.587s
sys 0m1.701s
real 0m19.924s
user 0m20.095s
sys 0m1.711s
real 1m16.032s
user 1m12.027s
sys 0m5.876s
Running simulaitons for wangara : tstart = 82800, tend = 104400
[4] 1281572
Mon Apr 17 12:23:34 PM CDT 2023
real 0m7.585s
user 0m7.648s
sys 0m1.382s
real 0m24.822s
user 0m24.884s
sys 0m1.433s
real 5m12.411s
user 4m38.426s
sys 0m35.756s
real 1m37.691s
user 1m32.891s
sys 0m6.255s
real 7m0.377s
user 6m6.458s
sys 0m55.271s
real 17m8.703s
user 14m49.639s
sys 2m21.107s
real 22m1.144s
user 19m1.054s
sys 3m1.682s
real 34m23.473s
user 29m47.359s
sys 4m38.414s
real 28m18.481s
user 24m29.588s
sys 3m50.308s
real 69m50.940s
user 60m26.703s
sys 9m26.090s
real 87m19.101s
user 75m42.186s
sys 11m38.275s
real 110m46.923s
user 96m21.185s
sys 14m26.806s
real 138m8.185s
user 120m22.975s
sys 17m47.138s
client_loop: send disconnect: Connection reset by peer
@bmg929 : Hi Brian, the convergence test simulations would refine the vertical resolution. The highest refinement is 2^7 which is 128 times smaller than the default resolution. For the RICO case, the model top is 10 km, so the highest resolution's vertical levels are 7297 (the smallest grid spacing is about 0.2m ). This will indeed take some time to finish.
For the convergence test simulations I conducted:
For the old version of the code we used for the CLUBB convergence paper: the convergence test simulations with the highest resolution (refine the default grid by a factor of 2^7) will take about 20-24 hours for RICO case as this case use a model height of 10km which will have much more vertical levels than other cases. For other cases, the simulation would finish within 8hour wall time. For all of these simulations, I set the output frequency as 600s.
For the new version of code from the master branch: the same high-resolution simulation for RICO case was not finished within 48 hours walltime. I checked my simulation setup and found that I set the output frequency to 60s, so I think this should be one reason for the increased computational cost. For your simulation, I think it would be enough to use 600s output frequency for the convergence test.
For my two tests above, I used only one node for the simulations. I guess parallel jobs could also help reduce the computational cost, but I did not try it on my side.
Instead of running RICO, maybe for a first test it is quicker to run just the BOMEX case.
I am now attempting to run the convergence tests on Anvil. I have copied the linux_x86_64_ifort_compy.bash
script to make a version for anvil and changed a couple paths and settings. The code compiles almost all the way through, but fails at the very end:
ld: cannot find -lmkl_intel_lp64
ld: cannot find -lmkl_sequential
ld: cannot find -lmkl_core
make[1]: *** [/home/ac.griffin/clubb_convergence_test/compile/../bin/clubb_standalone] Error 1
make[1]: Leaving directory `/gpfs/fs1/home/ac.griffin/clubb_convergence_test/obj'
make: *** [clubb_standalone] Error 2
Within the compiler script, there is the line:
CPPFLAGS="-I$MKLPATH/../../include -I$NETCDF/include"
where MKLPATH
is the only variable in the script that is not defined somewhere locally within the script. However, it is not defined within my environment either. I am assuming that this is the origin of my error.
Would you happen to know how I might go about providing the right setting for MKLPATH
or where I might begin to look? Thank you.
Would you happen to know how I might go about providing the right setting for
MKLPATH
or where I might begin to look? Thank you.
Upon further inspection of env
, there is a MKLROOT
environmental variable. I altered the script to use MKLROOT
instead of MKLPATH
. However, the compilation still fails with the same error message.
Edit:
I needed change the include path from:
CPPFLAGS="-I$MKLPATH/../../include -I$NETCDF/include"
to
CPPFLAGS="-I$MKLROOT/include -I$NETCDF/include"
However, once again, it still fails with the same error message.
@bmg929: Hi Brian, I happened to have an Anvil account, and I just attempt to compile the CLUBB there successfully. I attached the environment file for your reference (I add .txt in the file name as github does not allow the upload of .bash file):
The path to the libraries that it is looking for when it fails appears to be: /blues/gpfs/home/software/spack-0.10.1/opt/spack/linux-centos7-x86_64/intel-17.0.4/intel-mkl-2017.3.196-v7uuj6zmthzln35n2hb7i5u5ybncv5ev/compilers_and_libraries_2017.4.196/linux/mkl/lib/intel64_lin
or more simply $MKLROOT/lib/intel64_lin
.
However, I don't know where to enter this information in the script (or otherwise) to ensure it can find the libraries that it is looking for.
@bmg929: Hi Brian, I happened to have an Anvil account, and I just attempt to compile the CLUBB there successfully. I attached the environment file for your reference (I add .txt in the file name as github does not allow the upload of .bash file):
Thank you Shixuan!
When I try to compile with the new script, I get the following error messages:
[ac.griffin@blueslogin2 compile]$ ./compile.bash -c config/linux_x86_64_ifort_anvil.bash
Lmod has detected the following error: These module(s) exist but
cannot be loaded as requested: "python"
Try: "module spider python" to see how to load the module(s).
Lmod has detected the following error: Cannot load module
"netcdf-fortran/4.5.3" without these module(s) loaded:
intel-parallel-studio/cluster.2020.2-xz35pbn anaconda3/2020.07
gcc/9.2.0-pkmzczt
While processing the following module(s):
Module fullname Module Filename
--------------- ---------------
netcdf-fortran/4.5.3 /soft/bebop/modulefiles/netcdf-fortran/4.5.3.lua
Lmod has detected the following error: Cannot load module
"netcdf-c/4.7.4" without these module(s) loaded:
intel-parallel-studio/cluster.2020.2-xz35pbn anaconda3/2020.07
gcc/9.2.0-pkmzczt
While processing the following module(s):
Module fullname Module Filename
--------------- ---------------
netcdf-c/4.7.4 /soft/bebop/modulefiles/netcdf-c/4.7.4.lua
Lmod has detected the following error: The following module(s) are
unknown: "intel-mkl/2019.5.281"
Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore-cache load "intel-mkl/2019.5.281"
Also make sure that all modulefiles written in TCL start with the string #%Module
When I make the following changes to the script, I can get it to compile on anvil for me:
[ac.griffin@blueslogin2 bin]$ git diff
diff --git a/compile/config/linux_x86_64_ifort_anvil.bash b/compile/config/linux_x86_64_ifort_anvil.bash
index 74114a3..039b450 100644
--- a/compile/config/linux_x86_64_ifort_anvil.bash
+++ b/compile/config/linux_x86_64_ifort_anvil.bash
@@ -6,6 +6,9 @@
module purge
module load python
module load intel
+module load intel-parallel-studio/cluster.2020.2-xz35pbn
+module load anaconda3/2020.07
+module load gcc/9.2.0-pkmzczt
module load netcdf-fortran/4.5.3
module load netcdf-c/4.7.4
module load intel-mkl/2019.5.281
When I make the following changes to the script, I can get it to compile on anvil for me:
[ac.griffin@blueslogin2 bin]$ git diff diff --git a/compile/config/linux_x86_64_ifort_anvil.bash b/compile/config/linux_x86_64_ifort_anvil.bash index 74114a3..039b450 100644 --- a/compile/config/linux_x86_64_ifort_anvil.bash +++ b/compile/config/linux_x86_64_ifort_anvil.bash @@ -6,6 +6,9 @@ module purge module load python module load intel +module load intel-parallel-studio/cluster.2020.2-xz35pbn +module load anaconda3/2020.07 +module load gcc/9.2.0-pkmzczt module load netcdf-fortran/4.5.3 module load netcdf-c/4.7.4 module load intel-mkl/2019.5.281
@bmg929 : Great that you find a way on your side. The script I shared works on my side. The setup of compiling environment always puzzles me.
There appears to be an issue when I run run_cnvg_test_multi_cases_baseline.csh
in regards to its call to run convergence_config.py
in these lines:
#!/bin/bash
date
echo
EOB
set k = 1
while ( $k <= $nrefs )
set jobid = `printf "%02d" $k`
set config = "-dt $time_steps[$k] -ref $refine_levels[$k] -ti ${tstart} -tf ${tend} -dto ${dt_output}"
set strs0 = 'time python3 '"${topdir}"'/run_scripts/convergence_run/convergence_config.py $1 -output-name $2'
if( $k < $nrefs) then
set strs1 = '-skip-check ${@:3} > ${1}_${2}_${SLURM_JOBID}-'"${jobid}"'.log 2>&1 &'
else
set strs1 = '-skip-check ${@:3} > ${1}_${2}_${SLURM_JOBID}-'"${jobid}"'.log 2>&1 '
endif
echo "" >> ${run_script}
echo "${strs0} ${config} ${config_flags} ${strs1}" >> ${run_script}
echo "sleep 20" >> ${run_script}
@ k++
end
There is no output from running this script other than in the *.log files, which all contain the following error message:
Traceback (most recent call last):
File "/home/ac.griffin/clubb_convergence_test/run_scripts/convergence_run/convergence_config.py", line 13, in <module>
import numpy as np
ModuleNotFoundError: No module named 'numpy'
@bmg929: Hi Brian, I think the errors are due to the Python libraries. In the script, there are three steps:
In the third step, we use a Python script to process the data and generate the figures. The Numpy is needed to calculate error metrics such as root-mean-square errors. It seems to me that your model complains because Numpy is not installed in your Python environment.
A simple solution is to comment out "import numpy as np" and also the plotting section in run_cnvg_test_multi_cases_baseline.csh. Otherwise, we need to provide a Python environment that includes Numpy.
I can work on the Anvil and find a solution since you are using the CLUBB there, However, I would like to ask you here if this is what you want?
Thank you!
Shixuan
Thanks Shixuan! I will try to troubleshoot it and see if I can get it to work on Anvil. If I can't, then I might need a little help. My goal is to run it on Anvil (which I have access to) so that we can intermittently check future versions of CLUBB to see if they still converge.
In the third step, we use a Python script to process the data and generate the figures. The Numpy is needed to calculate error metrics such as root-mean-square errors. It seems to me that your model complains because Numpy is not installed in your Python environment.
A simple solution is to comment out "import numpy as np" and also the plotting section in run_cnvg_test_multi_cases_baseline.csh. Otherwise, we need to provide a Python environment that includes Numpy.
The problem first occurs on the second step, which is the running of the convergence test. The running of the tests generates a bash script for running each case, and these bash scripts contain a command to call the python convergence_config.py
file. This script, in turn, calls the function modify_ic_profile
, which is found in the python convergence_function.py
. I commented out import numpy as np
in both python files. However, as it turns out, both python scripts reference np
on multiple lines of code, so it appears that it is not so simple as to remove the importation of numpy.
However, UWM has a battery of python scripts that were written by Zhun Guo, all which contain import numpy as np
. These scripts have been successfully run on Anvil. Therefore, there must currently be a way to load a python environment that includes numpy on Anvil. All I should have to do is follow the same steps as I do before I run Zhun Guo's scripts.
Futhermore, some of our postprocessing scripts for running E3SM diagnostics make use of e3sm_diags_env.yml
, which includes a dependency on numpy
. We usually load it as follows:
source /lcrc/soft/climate/e3sm-unified/base/etc/profile.d/conda.sh
conda activate e3sm_diags_env
python run_e3sm_diags.py
After testing, this appears to be working so far. I issue the command source /lcrc/soft/climate/e3sm-unified/base/etc/profile.d/conda.sh
, followed by the command conda activate e3sm_diags_env
. Then ...
(e3sm_diags_env) [ac.griffin@blueslogin3 convergence_run]$ csh run_cnvg_test_multi_cases_baseline.csh
When I do it this way, the cnvg_baseline
directory fills up with files and all the *.log
files show no errors. So far, so good!
@bmg929 : Hi Brian, thank you for providing the details and the solution. I forgot that I used the Python script to construct the initial condition profile in-the-fly for the Dycoms_RF02 case and modifications of the sounding profile for other cases (in the step 2 as I mentioned and as you mentioned above). The purpose here is to provide a fixed and smoothed initial condition profile so that all refinement simulations used the same initial condition profile for initialization. Also, we need to construct some fake layers (will missing values) to ensure that the cubic spline interpolation reproduces the sounding profile as much as possible (when the sounding is too coarse, the cubic spline interpolation can generate some small features that are purely from the interpolation). The Numpy is called to do such task. I think that using the existing e3sm_diags_env.yml is a better idea as it will avoid the issues of the changes in the environment.
There are further issues in step 3, which is the postprocessing step:
Traceback (most recent call last):
File "bomex_fig.py", line 3, in <module>
from netCDF4 import Dataset
ModuleNotFoundError: No module named 'netCDF4'
removed ‘bomex_fig.py’
Traceback (most recent call last):
File "rico_fig.py", line 3, in <module>
from netCDF4 import Dataset
ModuleNotFoundError: No module named 'netCDF4'
removed ‘rico_fig.py’
Traceback (most recent call last):
File "dycoms2_rf02_nd_fig.py", line 3, in <module>
from netCDF4 import Dataset
ModuleNotFoundError: No module named 'netCDF4'
removed ‘dycoms2_rf02_nd_fig.py’
Traceback (most recent call last):
File "wangara_fig.py", line 3, in <module>
from netCDF4 import Dataset
ModuleNotFoundError: No module named 'netCDF4'
removed ‘wangara_fig.py’
Mon Apr 24 10:54:32 CDT 2023
I can get around the netCDF4
issue by using a different conda environment. In order to use Zhun Guo's E3SM-CLUBB diagnostic budget, he had as create conda
environments that are loaded by conda activate <USERNAME>
. Instructions are found here: https://github.com/larson-group/E3SM/wiki/Zhun's-guide-to-running-the-single-column-and-global-versions-of-the-E3SM-model#32-plotting-results-how-to-use-clubbs-budget-diagnostic-package
This environment includes Numpy and netCDF4.
However, even after using that instead of e3sm_diags_env
, there is still another error:
Traceback (most recent call last):
File "bomex_fig.py", line 10, in <module>
import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'
removed ‘bomex_fig.py’
Traceback (most recent call last):
File "rico_fig.py", line 10, in <module>
import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'
removed ‘rico_fig.py’
Traceback (most recent call last):
File "dycoms2_rf02_nd_fig.py", line 10, in <module>
import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'
removed ‘dycoms2_rf02_nd_fig.py’
Traceback (most recent call last):
File "wangara_fig.py", line 10, in <module>
import seaborn as sns
ModuleNotFoundError: No module named 'seaborn'
removed ‘wangara_fig.py’
Mon Apr 24 11:30:31 CDT 2023
Postprocessing requires something called "seaborn".
Perhaps I could try to further alter the custom e3sm_diags_env.yml
, rename it, and then try to add some more lines to include the missing pieces.
Good news on this front -- all the required packages (numpy, netcdf4, and seaborn) are made available simply by loading the following:
source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh
I can then run using:
(e3sm_unified_1.8.0_nompi) [ac.griffin@blueslogin2 convergence_run]$ csh run_cnvg_test_multi_cases_baseline.csh
and avoid all the python errors from some package not being found.
We now come to the next issue -- no output is being produced.
@bmg929: Could you point me which machine are you using? It seems to me that "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh" seems to be generated for the Argon Chrysalis machine rather than Anvil machine as you mentioned in previous comments above. I have accounts for both machines, and if you can tell me which machine you are using, then I can do a quick test on my side, see if I encountered the same issues, and provide some information to you from my investigation.
@bmg929: Could you point me which machine are you using? It seems to me that "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh" seems to be generated for the Argon Chrysalis machine rather than Anvil machine as you mentioned in previous comments above. I have accounts for both machines, and if you can tell me which machine you are using, then I can do a quick test on my side, see if I encountered the same issues, and provide some information to you from my investigation.
I am using Anvil; however, I know that the same home directory and file system is used for both machines, and the activation of the E3SM unified environment appears to be the same for both machines (LCRC), judging by these instructions: https://e3sm-project.github.io/e3sm_diags/_build/html/main/install.html#activate-e3sm-unified-environment
While CLUBB successfully compiles (and the clubb_standalone
executable successfully appears within the bin
directory), a simple test of run_scm.bash
shows that CLUBB isn't running. Here is the error message:
(e3sm_unified_1.8.0_nompi) [ac.griffin@blueslogin2 run_scripts]$ ./run_scm.bash bomex
Running bomex
../bin/clubb_standalone: error while loading shared libraries: libnetcdff.so.7: cannot open shared object file: No such file or directory
While CLUBB successfully compiles (and the
clubb_standalone
executable successfully appears within thebin
directory), a simple test ofrun_scm.bash
shows that CLUBB isn't running. Here is the error message:(e3sm_unified_1.8.0_nompi) [ac.griffin@blueslogin2 run_scripts]$ ./run_scm.bash bomex Running bomex ../bin/clubb_standalone: error while loading shared libraries: libnetcdff.so.7: cannot open shared object file: No such file or directory
The compiler script config/linux_x86_64_ifort_anvil.bash
sets the following line:
# == NetCDF Location ==
NETCDF=$NETCDF_ROOT
However, $NETCDF_ROOT
is not defined anywhere, neither in the script nor in my environmental variables. Printing the value of $NETCDF_ROOT
simply yields a blank line.
In the e3sm_unified_1.8.0_nompi
environment that I've loaded and am running within, the path to libnetcdff.so.7
appears to be the following: /lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.8.0_nompi/lib
.
Edit:
Nevermind, when I set NETCDF
to this path, the code won't compile owing to NetCDF being compiled with a different compiler:
/home/ac.griffin/clubb_convergence_test/compile/../src/CLUBB_core/output_netcdf.F90(42): error #7013: This module file was not generated by any release of this compiler. [NETCDF]
use netcdf, only: &
Phooey
These are the modules that are loaded in the compiler script:
module load netcdf-fortran/4.5.3
module load netcdf-c/4.7.4
so I'm going to have to find the path that goes along with these.
@bmg929 : Hi Brian, I think we have a misunderstanding that causes my suggestions above less useful. I thought the Anvil machine you mentioned referred to the Anvil at Purdue University (https://www.rcac.purdue.edu/knowledge/anvil/access). As I have this "Anvil" account, and I tested and run the CLUBB there and provided the compiling environment script to you. However, it turned out that this is not the case.
It turned out that you are referring to the Argon Anvil/Chrysalis machine. I have tested and figured out the compile environment setups as well as the changes in the run scripts: Compiling scripts:
Revised run scripts
Note: remember to remove .txt to convert the file to shell scripts. Also, I only run a quick test and killed the jobs once I saw the model output is generated. Please let me know if other issues arise.
@bmg929 : Hi Brian, I think we have a misunderstanding that causes my suggestions above less useful. I thought the Anvil machine you mentioned referred to the Anvil at Purdue University (https://www.rcac.purdue.edu/knowledge/anvil/access). As I have this "Anvil" account, and I tested and run the CLUBB there and provided the compiling environment script to you. However, it turned out that this is not the case.
It turned out that you are referring to the Argon Anvil/Chrysalis machine. I have tested and figured out the compile environment setups as well as the changes in the run scripts: Compiling scripts:
Revised run scripts
- Default setup: run_cnvg_test_multi_cases_default.csh.txt
- Baseline setup: run_cnvg_test_multi_cases_baseline.csh.txt
- All changes with improved convergence: run_cnvg_test_multi_cases_revall.csh.txt
Note: remember to remove .txt to convert the file to shell scripts. Also, I only run a quick test and killed the jobs once I saw the model output is generated. Please let me know if other issues arise.
Thank you very much for this, Shixuan! The code compiled successfully and run_cnvg_test_multi_cases_baseline.csh
is currently running and producing output on Anvil!
Great news! Please let me know if there are any questions after you finish the simulation and obtain the results.
Update: It looks like it ran, but I noticed an error in the postprocessing section. However, upon further inspection, it was a "Disk Quota Exceeded" error message. I am currently rerunning.
I ran the "baseline" convergence tests to a successful completion.
The above are the thlm convergence plots from each of the 4 cases in the "baseline" run.
There are plenty of other fields to look at as well ... not sure what fields are the most relevant to look at.
@bmg929 : Hi Brian, thank you for updating. I should point out that the figures I uploaded here is a comparison of "Default" and "Revised" configuration. The default is the simulations with all default setup in CLUBB, while the revised configuration refers to the simulation will all changes related to the convergence paper.
The baseline configuration as you shown above is an in-between configuration: i.e. Default + revised initial and boundary conditions only. Therefore, you will see that the convergence looks different from the figures I showed. I think it would be useful if you have run simulations that with "Revised" configuration and see if you can still obtain the first-order convergence as in the figures I showed.
As for your question: during our convergence paper work, we selected "thlm" and "wp3" as two key variables and check the convergence for these two variables first. Then the "wp2", "wpthlp" "um" and "upwp". If the convergence for these variables look reasonable, most of the other variables would show reasonably good convergence as well. However, this is empirical.
In addition, when we are trying to diagnose convergence issues, we will check all convergence figures, and select the variable which has earliest divergence (i.e. convergence rate reduces to below 1) as the start point to understand the reason for the degraded convergence.
Edit: I am now showing the both the "default" and "revall" runs so that you can see the difference side by side:
thlm:
BOMEX default: BOMEX revall: RF02 default: RF02 revall: RICO default: RICO revall: Wangara default: Wangara revall:
wp3:
BOMEX default: BOMEX revall: RF02 default: RF02 revall: RICO default: RICO revall: Wangara default: Wangara revall:
@bmg929 : the results above seem to be consistent with what I got from my test simulations. The only difference is wp3 in Wangara case at hour 4, but I think this maybe not an issue given that the master branch is different from the code for the CLUBB convergence paper. Overall, I think that the results here are still consistent with what we got in the CLUBB convergence paper.
@vlarson: Do you think that the results here are good enough or are consisten with the results in our convergence paper?
I created the following document to add more fields to the analysis: CLUBB_convergence_20230504.pdf
In all comparisons, the "default" is on the left and the "revised" is on the right.
Fields thlm, wp3, wp2, wpthlp, um, and upwp are all included.
Pages 1-2 are BOMEX, pages 3-4 are DYCOMS-II RF02 ND, pages 5-6 are RICO, and pages 7-8 are Wangara.
@bmg929 : the results above seem to be consistent with what I got from my test simulations. The only difference is wp3 in Wangara case at hour 4, but I think this maybe not an issue given that the master branch is different from the code for the CLUBB convergence paper. Overall, I think that the results here are still consistent with what we got in the CLUBB convergence paper.
@vlarson: Do you think that the results here are good enough or are consisten with the results in our convergence paper?
To me, the results look convergent. But I'll forward Brian's plots to Chris Vogl in order to see what he thinks.
@bmg929 : Hi Brian, thank you for updating. I should point out that the figures I uploaded here is a comparison of "Default" and "Revised" configuration. The default is the simulations with all default setup in CLUBB, while the revised configuration refers to the simulation will all changes related to the convergence paper.
The baseline configuration as you shown above is an in-between configuration: i.e. Default + revised initial and boundary conditions only. Therefore, you will see that the convergence looks different from the figures I showed. I think it would be useful if you have run simulations that with "Revised" configuration and see if you can still obtain the first-order convergence as in the figures I showed.
@bmg929, do we have a simple script in the larson-group/clubb repo that can run the "Revised" configuration with the push of a button? If so, what is it? If not, can you please create one and commit it?
@vlarson @bmg929:
Hi Vince and Brian, this commits contains the updated scripts to run the convergence test simulations following the same setups in the convergence paper (https://doi.org/10.22541/essoar.167632252.26895646/v1). Here I created three scripts to run the three key setups that we investigated:
The run_scripts/convergence_run/convergence_config.py is also revised to configure the CLUBB with all revised setups, including the setup for linear diffusion that we tested in the convergence paper.
Here, I provide some of the test results on my side (default setup versus revised setup that includes all changes we made in the convergence paper ) as a reference for you and @vlarson:
The convergence of four cases with default and revised configurations: convergence_default_vs_revised.pdf
Surface fluxes of RICO case with default and revised configu surface_flux_defalut_vs_revised.pdf rations:
Responses of the BOMEX and RICO case to the changes in the limiters for Brunt–Väisälä frequency (BVF) and Richardson number: bvf_limiter_default_vs_revised.pdf
Note that: it seems that the current CLUBB code becomes expensive to run the simulations by refining the grid with 2^7 (I can not finish the simulation with 48 hours wall time on Compy). Therefore, I used the simulations with refinement of 2^6 as a reference to draw the convergence plots.
Overall, I think that my test results suggest that the results from the new code are pretty consistent with the results in our convergence paper (which uses an older code branch)