Closed uturuncoglu closed 3 years ago
Interesting ...
When I ran the regression tests yesterday for the ufs-weather-model release/public-v1 branch, everything worked on Cheyenne. See https://github.com/ufs-community/ufs-weather-model/pull/186.
Are you using the correct versions of the compiler? This is the modulefile for cheyenne.intel: https://github.com/ufs-community/ufs-weather-model/blob/develop/modulefiles/cheyenne.intel/fv3
module load ncarenv/1.3
module load intel/19.1.1
module load mpt/2.19
module load ncarcompilers/0.5.0
##
## use pre-compiled NetCDF, ESMF and NCEP libraries for above compiler / MPI combination
##
module use -a /glade/p/ral/jntp/GMTB/tools/ufs-stack-20200728/intel-19.1.1/mpt-2.19/modules
module load netcdf/4.7.4
module load esmf/8.0.0
module load bacio/2.4.0
module load crtm/2.3.0
module load g2/3.4.0
module load g2tmpl/1.9.0
module load ip/3.3.0
module load nceppost/dceca26
module load nemsio/2.5.1
module load sp/2.3.0
module load w3emc/2.7.0
module load w3nco/2.4.0
module load gfsio/1.4.0
module load sfcio/1.4.0
module load sigio/2.3.0
##
## SIONlib library
##
module use -a /glade/p/ral/jntp/GMTB/tools/modulefiles/intel-19.1.1/mpt-2.19
module load SIONlib/1.7.4
##
## load cmake
##
module load cmake/3.16.4
setenv CMAKE_C_COMPILER mpicc
setenv CMAKE_CXX_COMPILER mpicxx
setenv CMAKE_Fortran_COMPILER mpif90
setenv CMAKE_Platform cheyenne.intel
@climbfuji CIME is using intel/19.0.5 and mpt/2.19. We did not change anything particular in the build system but it might be nice to check again build options. If you don't mind and if you have, please let me know the run and build directories on Cheyenne and I could use them to compare the configurations.
@Ufuk Turuncoglu - NOAA Affiliate ufuk.turuncoglu@noaa.gov Dom is on leave for 10 days. My understanding is that you need to use this module file for cheyenne/intel: https://github.com/ufs-community/ufs-weather-model/blob/release/public-v1/modulefiles/cheyenne.intel/fv3. Note that this is from the release/public-v1 branch. In Dom's last comment he pointed you to the file from the develop branch (this may or not work - I don't know). https://github.com/ufs-community/ufs-weather-model/blob/release/public-v1/modulefiles/cheyenne.intel/fv3
In the public/release-v1 branch version of the cheyenne/intel file, you will see module use -a /glade/p/ral/jntp/GMTB/tools/modulefiles/intel-19.0.5/mpt-2.19 module load NCEPlibs/1.1.0
It is important to use the NCEPlibs/1.1.0 for the release (has the upgraded cghres_cube that ingests netCDF), along with the rest of the environment described in that file.
See if that helps. If not, I'll find some help in DTC to compile/run the release branch of the ufs-weather-model on Cheyenne for you. Let me know.
On Fri, Aug 14, 2020 at 11:15 AM Ufuk Turunçoğlu notifications@github.com wrote:
@climbfuji https://github.com/climbfuji CIME is using intel/19.0.5 and mpt/2.19. We did not change anything particular in the build system but it might be nice to check again build options. If you don't mind and if you have, please let me know the run and build directories on Cheyenne and I could use them to compare the configurations.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/169#issuecomment-674175899, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQAXVLLNFNPWYYA5DNITSAVWJNANCNFSM4P65GJAA .
@ligiabernardet it is fine. I'll try to run the model outside the CIME and find the differences. I'll keep you posted about the progress and if I need any help.
@ligiabernardet I just wonder that how can i clean existing build?
@Laurie Carson carson@ucar.edu Can you help answer this question?
On Mon, Aug 17, 2020 at 10:24 AM Ufuk Turunçoğlu notifications@github.com wrote:
@ligiabernardet https://github.com/ligiabernardet I just wonder that how can i clean existing build?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/169#issuecomment-674978471, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQARYY2LYN2RLDJ4G4F3SBFKRXANCNFSM4P65GJAA .
@climbfuji @ligiabernardet it seems that the error is gone whan I update the model to the latest. I am still testing it and I'll double check.
Hi, ufuk, you can use following command to do the compiling under the tests directory : ./compile_cmake.sh $PWD/.. cheyenne.intel 'CCPP=Y' 1 YES YES
the option "Yes" is to set the clean up before and after the build.
@uturuncoglu
Okay. Thanks @panll.
Hi all, it seems that I was wrong and I am getting same error with updated code. If I try to create a new case and try to do a fresh build, it fails with
Building atm with output to /glade/scratch/turuncu/ufs-mrweather-app-workflow.c96v2/bld/atm.bldlog.200819-142748 /glade/scratch/turuncu/ufs-mrweather-app-workflow.c96v2/bld/atm/obj/FV3/ccpp/physics/physics/radiation_aerosols.f(4840): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: )
error but if I run ./case.build again. It compiles without any problem. I also try to find any possible difference but I could not fina one that could case the issue. Any suggestion? @climbfuji @ligiabernardet
Hi Ufuk, Unfortunately I do not know what could be causing this problem. My understanding is that the model is building fineas a standalone (outside of CIME). Have you built the model as standalone successfully? - if so, then you could look at any differences. If you need DTC to build it on Cheyenne for you, let me know.
On Wed, Aug 19, 2020 at 2:55 PM Ufuk Turunçoğlu notifications@github.com wrote:
Hi all, it seems that I was wrong and I am getting same error with updated code. If I try to create a new case and try to do a fresh build, it fails with
Building atm with output to /glade/scratch/turuncu/ufs-mrweather-app-workflow.c96v2/bld/atm.bldlog.200819-142748 /glade/scratch/turuncu/ufs-mrweather-app-workflow.c96v2/bld/atm/obj/FV3/ccpp/physics/physics/radiation_aerosols.f(4840): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: )
error but if I run ./case.build again. It compiles without any problem. I also try to find any possible difference but I could not fina one that could case the issue. Any suggestion? @climbfuji https://github.com/climbfuji @ligiabernardet https://github.com/ligiabernardet
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/169#issuecomment-676694727, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQAV2NRM5EOZIOEARWKDSBQ33HANCNFSM4P65GJAA .
Yes, I could build standalone model outside of CIME without any problem. @jedwards4b is also looking for the possible differences now.
@uturuncoglu and @jedwards4b I was able to build the App and did not encounter any problems wrt compiling /ccpp/physics/physics/radiation_aerosols.f. I could not reproduce this problem.
@ligiabernardet i had a problem on Cheyenne but it happens randomly. If it fails, i am running case.build again and it builds fine. I am not sure about the cause at this point. Let's keep test the app and see what happens.
@ligiabernardet I run the full test suite on Cheyenne using following command,
qcmd -- "export UFS_DRIVER=nems; CIME_MODEL=ufs ./create_test --xml-testlist ../../src/model/FV3/cime/cime_config/testlist.xml --xml-machine cheyenne --workflow ufs-mrweather_wo_post -j 4 --walltime 03:00:00"
but all builds failing with same error. Could you test it in your side? I am not sure about the error but it seems model related because we did not change the build options as I know. @jedwards4b what do you think? Also, running ./case.build to fix the issue seems strange too.
@uturuncoglu sometimes when a build suddenly fixes itself after a second run, its because it was built in parallel first and the dependencies are not correct. Is the CIME build in parallel while the standalone build serial?
@rsdunlapiv i am not sure because in the first case the autogenerated radiation_aerosols.f is not well structured Fortran file and it causes error during compile step. There could be some bug in CCPP autogeneration step but I am not sure at this point. I'll test serial build, I think CIME uses 4 core to build the model. @ligiabernardet I just wonder that how UFS model build works? It builds parallel or serial? Also, I am not sure how CCPP preprocessing steps works.
@uturuncoglu The ccpp_prebuild.py step just generates the physics caps for the requested suite(s). It is described here.
Can @llpcarson shed any light on whether the UFS Weather Models builds in serial or parallel mode?
@ligiabernardet I don't think that this has anything to do with build order - you can clearly see the misformatted lines in the file.
ufs-weather-model uses parallel make, so I don't think this is the issue (although it could still be an unusual timing thing)
radiation_aerosols.f is not auto-generated code, though, as Ligia notes, CCPP builds the caps, not this one.
Laurie
On Fri, Aug 21, 2020 at 3:47 PM ligiabernardet notifications@github.com wrote:
@uturuncoglu https://github.com/uturuncoglu The ccpp_prebuild.py step just generates the physics caps for the requested suite(s). It is described here https://ccpp-techdoc.readthedocs.io/en/v4.0/CCPPPreBuild.html.
Can @llpcarson https://github.com/llpcarson shed any light on whether the UFS Weather Models builds in serial or parallel mode?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/169#issuecomment-678530528, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2OWIVCNZTXM4WD5U5HK7LSB3TPDANCNFSM4P65GJAA .
@Jim Edwards jedwards@ucar.edu There is Doxygen markup in the file for generating documentation. This has never been a problem in the platforms/compilers tested. This did not change from the v1.0.0 release. And even on Cheyenne, you said it builds when you attempt twice. Do you think the Fortran is illegal - but works on a second attempt? Hmmm - baffled.
On Fri, Aug 21, 2020 at 3:57 PM Laurie Carson notifications@github.com wrote:
ufs-weather-model uses parallel make, so I don't think this is the issue (although it could still be an unusual timing thing)
radiation_aerosols.f is not auto-generated code, though, as Ligia notes, CCPP builds the caps, not this one.
Laurie
On Fri, Aug 21, 2020 at 3:47 PM ligiabernardet notifications@github.com wrote:
@uturuncoglu https://github.com/uturuncoglu The ccpp_prebuild.py step just generates the physics caps for the requested suite(s). It is described here https://ccpp-techdoc.readthedocs.io/en/v4.0/CCPPPreBuild.html.
Can @llpcarson https://github.com/llpcarson shed any light on whether the UFS Weather Models builds in serial or parallel mode?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/ufs-community/ufs-mrweather-app/issues/169#issuecomment-678530528 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AB2OWIVCNZTXM4WD5U5HK7LSB3TPDANCNFSM4P65GJAA
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/169#issuecomment-678533490, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQAUROJMHTGE6A6L3B53SB3UVFANCNFSM4P65GJAA .
@climbfuji Do you have any idea what can cause this problem that Jim and Ufuk reported? They cannot compile the code, but at a second attempt it works. Linlin and I were not able to reproduce the problem using the updated CIME on Hera or Cheyenne.
Not sure. Maybe some default libraries loaded in the user environment? Always a bad idea. We usually do "module purge" before we build anything.
On Aug 24, 2020, at 10:26 AM, ligiabernardet notifications@github.com wrote:
@climbfuji https://github.com/climbfuji Do you have any idea what can cause this problem that Jim and Ufuk reported? They cannot compile the code, but at a second attempt it works. Linlin and I were not able to reproduce the problem using the updated CIME on Hera or Cheyenne.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/169#issuecomment-679231662, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5C2RM72C47W5B6BGY6G33SCKIE3ANCNFSM4P65GJAA.
@climbfuji module purge also exist in the CIME interface. I also compared that specific source file with old version of model that was used in 1.0 and it seems that source file is same and did not change since last release. It is strange but it gives error sometimes.
Can I check out the latest version using branch release/public-v1 of the ufs-mrweather-app and then use the manage externals utility?
@climbfuji yes you could test it with latest version of app but you need to use ufs_fix branch for CIME. This will be merged with CIME later. Here is the instructions to test,
export UFS_DRIVER=nems
git clone https://github.com/ufs-community/ufs-mrweather-app.git
cd ufs-mrweather-app
git checkout ufs-release-v1.1
./manage_externals/checkout_externals
cd cime/scripts
git checkout ups_fix
CIME_MODEL=ufs ./create_newcase --compset GFSv15p2 --res C96 --case ufs-mrweather-app-workflow.c96 --workflow ufs-mrweather
cd ufs-mrweather-app-workflow.c96
./case.setup
NOTE: add input_type = "gaussian_netcdf" to user_nl_ufsatm
./xmlchange RUN_STARTDATE=2020-02-02
./xmlchange DOUT_S=FALSE
./xmlchange STOP_OPTION=nhours
./xmlchange STOP_N=36
./xmlchange JOB_WALLCLOCK_TIME=00:30:00
./xmlchange USER_REQUESTED_WALLTIME=00:30:00
./case.build
./case.submit
BTW, currently I am running full CIME test suite again. I also did it at the weekend and only two test failed with the compile error even with sequential build (make -j 1). So, the error is not consistent.
SMS_Lh3_D.C192.GFSv16beta.cheyenne_intel (Overall: FAIL) details:
FAIL SMS_Lh3_D.C192.GFSv16beta.cheyenne_intel MODEL_BUILD time=75
SMS_Lh3_D.C96.GFSv16beta.cheyenne_intel (Overall: FAIL) details:
FAIL SMS_Lh3_D.C96.GFSv16beta.cheyenne_intel MODEL_BUILD time=75
Don't think it matters but are you using tcsh or bash on cheyenne?
I am using BASH shell.
As a first step, I just built the model as follows:
git clone https://github.com/ufs-community/ufs-mrweather-app.git
cd ufs-mrweather-app
git checkout ufs-release-v1.1
./manage_externals/checkout_externals
cd src/model
export CMAKE_Platform=cheyenne.gnu # or cheyenne.intel
module use -a $PWD/modulefiles/cheyenne.gnu # or cheyenne.intel
module load fv3
./build.sh 2>&1 | tee build.log
This worked for both Intel and GNU. I will now try checking out the ups_fix CIME branch and build the model through CIME.
Ok, here is what I did:
UFS_SCRATCH=$PWD/ufs_scratch PROJECT=P48503002 CIME_MODEL=ufs ./cime/scripts/create_newcase --compset GFSv15p2 --res C96 --case ufs-mrweather-app-workflow.c96 --workflow ufs-mrweather
cd ufs-mrweather-app-workflow.c96
./case.setup
export UFS_INPUT=/glade/work/heinzell/fv3/ufs-mrweather-app/ufs-mrweather-app-20200824/ufs_input
mkdir -p /glade/work/heinzell/fv3/ufs-mrweather-app/ufs-mrweather-app-20200824/ufs_input/ufs_inputdata
./case.setup
./preview_run
vi user_nl_ufsatm # add input_type = "gaussian_netcdf"
./xmlchange RUN_STARTDATE=2020-02-02
./xmlchange DOUT_S=FALSE
./xmlchange STOP_OPTION=nhours
./xmlchange STOP_N=36
./xmlchange JOB_WALLCLOCK_TIME=00:30:00
./xmlchange USER_REQUESTED_WALLTIME=00:30:00
./case.build
This worked for both Intel and GNU.
@climbfuji Okay. Strange. I am not sure what we will do with this. I'll try to use app v1.0 to see same error in there or not.
@climbfuji Okay. Strange. I am not sure what we will do with this. I'll try to use app v1.0 to see same error in there or not.
In https://github.com/ufs-community/ufs-mrweather-app/issues/169#issuecomment-678485639 you are running the build on a compute node using qcmd. Are you always doing that? I think @ligiabernardet, @panll and myself compiled on the login node.
@climbfuji No. I tested both of them. To run the test suite, i am using compute node but creating standalone new case I am using login node.
@ligiabernardet @climbfuji we are trying to update CIME to use model's internal module files. So, I tried to update the model (release/public-v1) but I could not see the Orion module files. It seems it is exist in develop,
https://github.com/ufs-community/ufs-weather-model/blob/develop/modulefiles/orion.intel/fv3
is there any plan to bring Orion module file to release/public-v1 branch?
@uturuncoglu I talked to @climbfuji who let me know that most modules (apart from Hera and Cheyenne) in the release/public-v1 are not up to date because the App was not previously using those. He said he'd update those modules soon.
@ligiabernardet That is great. Please let me know when it is ready.
The PRs is here: https://github.com/ufs-community/ufs-weather-model/pull/192 - we should be able to merge this today.
Why is this issue called "Error building CCPP" - this is not a CCPP error.
It is historical - that is what was believed at the time. Thanks, Jim, for changing the subject.
On Fri, Aug 28, 2020 at 7:19 AM Dom Heinzeller notifications@github.com wrote:
Why is this issue called "Error building CCPP" - this is not a CCPP error.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ufs-community/ufs-mrweather-app/issues/169#issuecomment-682561900, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7WQAQBTFQYYZB6WBRKNYTSC6VEZANCNFSM4P65GJAA .
@climbfuji Thanks.
All these PRs were merged, should be ready to test.
And I can confirm that ncep_post
no longer depends on libsz on hera and jet (these were the only "bad" machines).
@climbfuji okay, i am updating the app. I'll let you know when it is ready
@climbfuji @ligiabernardet @fossell @hertneky Here is the instruction to test the updated app. I tested on Cheyenne and it seems working
export UFS_DRIVER=nems git clone https://github.com/ufs-community/ufs-mrweather-app.git cd ufs-mrweather-app git checkout ufs-release-v1.1 ./manage_externals/checkout_externals cd cime/scripts/ ./create_newcase --compset GFSv15p2 --res C96 --case ufs-mrweather-app-workflow.c96 --workflow ufs-mrweather cd ufs-mrweather-app-workflow.c96/ ./case.setup ./case.build ./xmlchange DOUT_S=FALSE ./xmlchange STOP_OPTION=nhours ./xmlchange STOP_N=36 ./xmlchange JOB_WALLCLOCK_TIME=00:30:00 ./xmlchange USER_REQUESTED_WALLTIME=00:30:00 ./case.submit
BTW, my first build attempt failed with same way and if you want to look at the build log, it is in /glade/scratch/turuncu/ufs-mrweather-app-workflow.c96/bld/atm.bldlog.200828-141511
. I also check the radiation_aerosols.f file but i could not see any problem with it. It is very strange. The second ./case.build works fine as expected.
Notes:
if you need to use custom post flat files, you need to put them into SourceMods/src.ufsatm using same naming convention.
Do not forget the change the name of the default input file (it is under icfiles/201908/20190829) as atm.input.ic.grb2. Since we change the file naming convention, all the files must follow the following convention.
@uturuncoglu Under notes 2. I don't see any files existing in icfiles/201908/20190829. Am I missing something?
@hertneky it must be in $UFS_INPUTDATA/ufs_inputdata/icfiles/201908/20190829. If you are trying on Cheyyene it must work fine but for other platforms you need to copy the file from following FTP link.
https://ftp.emc.ncep.noaa.gov/EIB/UFS/inputdata/201908/20190829/
This just for IC and other files will be downloaded automatically. ,
BTW, i defined following variables on Orion.
export WORK=/work/noaa/nems/tufuk export UFS_INPUT=$WORK export UFS_INPUTDATA=$WORK export UFS_SCRATCH=$WORK
@uturuncoglu I am running on Cheyenne. What should $UFS_INPUT be on Cheyenne? For v1.0, I did setenv $UFS_INPUT $CESMDATAROOT -> giving a full path of /glade/p/cesmdata/cseg/ufs_inputdata/icfiles/201908/20190829, which does not contain any files. Does $UFS_INPUT need to be changed to point somewhere else.
@hertneky sorry, i moved them for testing and I forgot it. could you try again. sorry again.
@ligiabernardet @climbfuji We are getting following error when we try to build UFS MR Weather App under CIME. I am not sure it is a know issue or not.
In this case, i am using release/public-v1 branch for the model and last commit is
If you could also create a case on Cheyenne using UFS MR Weather Model (or I could do it if you provide me the instructions) that could help to compare the possible differences in the build. BTW, I am using intel compiler and MPT combination.