ufs-community / ufs-weather-model

UFS Weather Model
Other
140 stars 247 forks source link

p8 / p5 tag issue on gaea: CPC experiment support #1755

Closed jkbk2004 closed 8 months ago

jkbk2004 commented 1 year ago

Description

Solution

jieshunzhu commented 1 year ago

Thanks @natalie-perlin.

natalie-perlin commented 1 year ago

@jieshunzhu - If the directory structure ./ufs-s2s-model_zbotC5/ is copied from your space, could I test building the code without submitting a batch job?

jieshunzhu commented 1 year ago

@natalie-perlin, when building the code using the scripts from that directory, you don't need to submit a batch job. But I found in many scripts library directories were hard-coded. We need to modify them manually one by one. I am making tests about those.

natalie-perlin commented 1 year ago

@jieshunzhu - thank you. What is a proper way to build a code using the existing scripts? I'm testing changing ./modulefiles/gaea.intel/fv3 and want to see what messages I get during the build.

jieshunzhu commented 1 year ago

@natalie-perlin. I have no clear idea about which way is more efficient. The P5 version was set up at CPC two years ago by another person who has retired. When going into the compiling scripts, it seems there are lots of files that have to be modified, not only fv3. Examples include ./NEMS/src/conf/module-setup.sh.inc, ./tests/compile.sh, ./FV3/ccpp/build_ccpp.sh.

natalie-perlin commented 1 year ago

@jieshunzhu - yes, I'm looking into these scripts, too. Do you know (or does anybody know) how to compile the code? Any way works. It differs from the current UFS WM, so I have to know how to test my changes to the modules.

jieshunzhu commented 1 year ago

@natalie-perlin The person built it at CPC is Weiyu Yang. I don't know if he followed the structure of EMC's or completely his own style. He has retired, but let me try contacting him. If I find any useful information, I will share with you. Really appreciate your helps, Natalie.

jieshunzhu commented 1 year ago

@natalie-perlin I called Weiyu and didnot get any useful information. As I mentioned earlier, Weiyu put in lots of hard-coded modifications. We have to modify them one by one when testing any new stacks.

In addition, I compiled the system around 2 years ago. The associated log files are still here: /lustre/f2/dev/ncep/JieShun.Zhu/ufsp5/ufs-s2s-model_zbot/tests/log_gaea.intel/compile_1.log. That may help you better follow the scripts.

natalie-perlin commented 1 year ago

Thank you for clarification of what needs to be done and for the log files.

jieshunzhu commented 1 year ago

@jkbk2004 @natalie-perlin I am able to compile P5 using libraries that were built for P8 with hpc-stack. But when testing the executable, I got errors saying "Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, MOVBE, POPCNT, AVX, F16C, FMA, BMI, LZCNT and AVX2 instructions."

Did you see the error before? Thanks.

jkbk2004 commented 1 year ago

sounds like it is not capturing processor information at compiler level. what about running cpuinfo ?

jieshunzhu commented 1 year ago

@jkbk2004 Can you point me where I specify processor information in UFS? Here is my job card for my experiment: /lustre/f2/scratch/ncep/JieShun.Zhu/UFS_zbot/fcst_25e1/cpld_fv3_ccpp_mom6_cice_cmeps_cold_2023102500/job_card. The error is shown in the "out" file.

jkbk2004 commented 1 year ago

Somewhere cmake level, I think: https://github.com/ufs-community/ufs-weather-model/blob/develop/cmake/configure_s4.intel.cmake or add directly to compile flag like ./CICE-interface/CICE/configuration/scripts/machines/Macros.derecho_intel:FFLAGS := -fp-model precise -convert big_e ndian -assume byterecl -ftz -traceback -march=core-avx2

jieshunzhu commented 12 months ago

@jkbk2004 In my original compilation flags there was an option xcore-avx2 which is related to Intel processors. After removing it, the model can run a bit, but stopped after "COMPLETED MOM INITIALIZATION". The model just stuck there until reaching the wall clock. Did you or @natalie-perlin see a similar problem before?

jkbk2004 commented 11 months ago

@jieshunzhu Maybe it might be worth to build mom6 with debug. Or some print out at mom6 main driver level.

jieshunzhu commented 11 months ago

@jkbk2004 Thanks for the suggestions. I tried building MOM6 with debug, but it is interesting that I did not see additional log information. I am actually working with the same strategy as your second idea. I will let you know if I find anything.

jkbk2004 commented 11 months ago

@jieshunzhu I am not sure if DDT (debugger) is available on gaea. I will check just in case.

jkbk2004 commented 11 months ago

https://gaeadocs.rdhpcs.noaa.gov/wiki/index.php?title=Debuggers

jieshunzhu commented 11 months ago

Thanks @jkbk2004. I havenot tried DDT before. I may ask you questions about it later.

natalie-perlin commented 11 months ago

@jieshunzhu @jkbk2004 FYI on the progress with the P5 code, if it is still under consideration (as you mentioned Nov. 30 deadline)

I'm getting close to have a P5 code compiled on my end on Gaea C5 with the spack-stack/1.4.1, which corresponds to the same version of compilers as EPIC-built hpc-stack (intel-classic-2023.1.0), and higher versions of hdf5/1.14.0, netcdf-c/4.9.2, esmf/4.8.2. There a couple of relatively simple errors/paths still need fixing for the fms build. I will plan to do a test run after it is fully built, yet I haven't looked into the setup of the initialization files and whether anything special is required to stage the run. Please let me know any comments. Please feel free to take a look into my setup on Gaea: /lustre/f2/scratch/ncep/Natalie.Perlin/ufs-s2s/ufsp5/ufs-s2s-model_zbotC5/

jieshunzhu commented 11 months ago

@natalie-perlin Thanks for the update. I think I almost fix the problem by using hpc-stack. You can hold it on your side (I do not want to waste your time).

But I may need to ask you about how to build spack-stack which I need to use for jedi-soca. Sine the jedi-soca version is not the develop branch, I may need to build an elder spack-stack.

Thanks again for your and @jkbk2004 Jong's persistent support and help on our projects at CPC. Really appreciate it!

jieshunzhu commented 11 months ago

@jkbk2004 @natalie-perlin Just want to give you an update about transitioning P5 to C5: it works now. The key thing here is still about the version of ESMF. I need to use an old version for P5. Thanks again for all your supports!

natalie-perlin commented 11 months ago

Thank you so much for letting us know that this works for you! If you don't mind sharing your recent staging location for the P5 on Gaea-c5, I'd be glad to take a look that it all looks consistent!

natalie-perlin commented 11 months ago

As to older spack-stack, if the packages and versions that you need in the jedi-soca have been made available to spack central repository, there should be no issues of building them as a part of custom spack-stack. The key is to know the list of exact packages to specify for the spack-stack configuration.

jieshunzhu commented 11 months ago

Sure. It will be my pleasure. My P5 source code directory with modifications: /lustre/f2/dev/ncep/JieShun.Zhu/ufsp5/ufs-s2s-model_zbotC5t My running directory with outputs: /lustre/f2/scratch/ncep/JieShun.Zhu/UFS_zbot/fcst_25e1 The stack used to compile P5: /lustre/f2/dev/ncep/JieShun.Zhu/util/hpc-stack/c5/intel-classic-2023.1.0P5 The source code of building the stack: /lustre/f2/dev/ncep/JieShun.Zhu/util/hpc-stack/c5/src-intel-classic-2023.1.0P5

jieshunzhu commented 11 months ago

As to older spack-stack, if the packages and versions that you need in the jedi-soca have been made available to spack central repository, there should be no issues of building them as a part of custom spack-stack. The key is to know the list of exact packages to specify for the spack-stack configuration.

Thanks for sharing the information. I need to finish some other more urgent projects before going into the spack-stack. When starting with it, I may ask you questions about it. Thanks in advance.

jkbk2004 commented 11 months ago

@jieshunzhu Congrats! It will be beneficial to continue the support for cpc's p5/p8/c5 operational run: stack, ufs-wm version update, etc. I will tag you up later.

jieshunzhu commented 11 months ago

@jkbk2004 @natalie-perlin Do you have time to help me with another small tool? This tool converts CFSR atmospheric states to FV3 initial conditions. It uses lots of libraries of UFS/FV3, i.e., hpc-stack. I need to compile it on C5 as well.

jieshunzhu commented 11 months ago

@jkbk2004 @natalie-perlin Do you have time to help me with another small tool? This tool converts CFSR atmospheric states to FV3 initial conditions. It uses lots of libraries of UFS/FV3, i.e., hpc-stack. I need to compile it on C5 as well.

  • The source code is here: /lustre/f2/dev/ncep/JieShun.Zhu/util/ICchgres_CFSR_FV3_C5/global_chgres.fd4EPIC.
  • I gave it a try in ../global_chgres.fd in which I made a new file (mk.sh) by including libraries information. The error information is in make.out. My problem is related to the linkage to those libraries.

Never mind. I got the problem fixed. Thanks anyway.

jkbk2004 commented 11 months ago

@jieshunzhu we can extend a bit of https://github.com/ufs-community/ufs-weather-model/pull/2005 on our side.

jieshunzhu commented 10 months ago

@jkbk2004 @natalie-perlin Happy New Year! I am now trying to transition the JEDI soca-science to C5. Similar to my UFS problem, on C5 I failed running the version of soca-science I need by using spack-stack 1.5.1 (which works for the "develop" repository of soca-science). On C4, I can run it with spack-stack1.4.0. So I tried to install spack-stack1.4.0 in my own directory (/lustre/f2/dev/ncep/JieShun.Zhu/util/spack-stack/c5/spack-stack-1.4.0).

I git clone spack-stack-1.4.0 directly from JCSDA website, and didnot make any changes. After installation, I cannot see Core/ under /envs/unified-dev/install/modulefiles/. Could you please give me some some hints about my problems? I saved installation log files in my directory. Thanks in advance.

zach1221 commented 8 months ago

@jieshunzhu I'm going to place this ticket in resolved. Please let me know if you feel it should be kept open.

jieshunzhu commented 8 months ago

@zach1221 Sure, it can be closed. Thanks!