oceanmodeling / schism

Semi-implicit Cross-scale Hydroscience Integrated System Model (SCHISM)
http://ccrm.vims.edu/schismweb/
Apache License 2.0
0 stars 0 forks source link

Fix standalone build #1

Closed uturuncoglu closed 6 months ago

uturuncoglu commented 7 months ago

The changes that are done to build model under UFS Coastal are creating issue in standalone build outside of the UFS Coastal. @josephzhang8 test threw a lot of errors related to netcdf and util scripts. There is a need for a target that does not compile any util scripts. Need also check pschism target.

uturuncoglu commented 7 months ago

@platipodium @josephzhang8 I could see BLD_STANDALONE option in the changes that I did. I think if you pass -DBLD_STANDALONE=ON to your cmake command, it will probably activate it. Anyway, I am in a training now but I'll look at it later.

uturuncoglu commented 7 months ago

BTW, i am not seeing any changes in my PR that removes pschism target. The changes just to compile the model inside of the UFS Coastal.

pvelissariou1 commented 7 months ago

@josephzhang8, @platipodium , @uturuncoglu I checked the latest version of SCHISM and it compiled successfuly only if I explicitly supply to cmake the option: -DBLD_STANDALONE=ON. This should be set ON by default. Maybe I missed it, why we need to have the BLD_STANDALONE flag? What is its purpose?

uturuncoglu commented 7 months ago

@pvelissariou1 The build system was creating issue under UFS Coastal since it was not design to compile it under another application. So, I introduce BLD_STANDALONE=ON to get rid of those issues. The default is OFF but if we don't want to provide this option every time when we are trying to install SCHSIM, then I could change the default to ON and then make required change in the UFS Coastal SCHSIM build interface.

pvelissariou1 commented 7 months ago

@uturuncoglu Let's set BLD_STANDALONE=ON as the default and use BLD_STANDALONE=ON/OFF in UFS-Coastal depending upon the type of compilation. Still, I believe we need to address the NetCDF issues in a different way, without having this BLD_STANDALONE flag in place. What were the issues while building SCHISM in UFS-Coastal? I remember building SCHISM using previous versions of UFS-Coastal without any issues. Your thoughts?

josephzhang8 commented 7 months ago

Thx @pvelissariou1 @uturuncoglu @platipodium . Adding standalone flag worked on my side.

If it's hard to change the default, we can live with this by adding the flag inside SCHISM.local.build. @water-e

pvelissariou1 commented 7 months ago

@josephzhang8 My thinking is that, because most people use SCHISM in a standalone configuration it is better to have BLD_STANDALONE=ON by default. This is easily done within CMake. I haven't checked the GNU make configs yet. Do we need to implement the BLD_STANDALONE over there too?

water-e commented 7 months ago

I am planning to get to this, by the way, but I've been hampered by a lot of hardware issues which seem to be finally clearing. Even if you do a workaround I'll try to take a look.


From: Joseph Zhang @.> Sent: Friday, February 9, 2024 11:03 AM To: oceanmodeling/schism @.> Cc: Ateljevich, @. @.>; Mention @.***> Subject: Re: [oceanmodeling/schism] Fix standalone build (Issue #1)

Thx @pvelissariou1https://github.com/pvelissariou1 @uturuncogluhttps://github.com/uturuncoglu @platipodiumhttps://github.com/platipodium . Adding standalone flag worked on my side.

If it's hard to change the default, we can live with this by adding the flag inside SCHISM.local.build. @water-ehttps://github.com/water-e

— Reply to this email directly, view it on GitHubhttps://github.com/oceanmodeling/schism/issues/1#issuecomment-1936454045, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG2AJC6BGA6F6WL7DFF3DXTYSZXHZAVCNFSM6AAAAABC4O7LC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWGQ2TIMBUGU. You are receiving this because you were mentioned.Message ID: @.***>

josephzhang8 commented 7 months ago

Thx @water-e

josephzhang8 commented 7 months ago

I'll go ahead push the change to SCHISM repo, b/c many users are depending on cmake working. If we later on decide to set flag on UFS side we can revise. Thx

josephzhang8 commented 7 months ago

@josephzhang8 My thinking is that, because most people use SCHISM in a standalone configuration it is better to have BLD_STANDALONE=ON by default. This is easily done within CMake. I haven't checked the GNU make configs yet. Do we need to implement the BLD_STANDALONE over there too?

Thx @pvelissariou1 for teh reminder on gcc. I just tested it on our HPC and it works also.

My fix is to add STANDALONE inside cmake/SCHISM.local.build, together with other SCHISM options like OLDIO etc, except that it should not be turned off for SCHISM alone.

uturuncoglu commented 7 months ago

@josephzhang8 @pvelissariou1 @water-e I think we could change the default in the SCHSIM build system. I think it will be easy to adjust the UFS Coastal.

pvelissariou1 commented 7 months ago

@uturuncoglu I completely agree. @josephzhang8 I submitted a PR for PaHM updates in SCHISM, could you please take a look at it? We tested it here at NOAA and it seems to be working as expected

pvelissariou1 commented 7 months ago

@uturuncoglu , @josephzhang8 Ufuk did you have the opportunity to compile with WWM ON? This configuration was failing during compilation. I'll check on this later today from my side.

uturuncoglu commented 7 months ago

@pvelissariou1 No. Is this using internal WW3? We have a regression test working with external WW3 and it is working.

pvelissariou1 commented 7 months ago

@uturuncoglu ok, I will check and report back. WWM is the internal wave model.

josephzhang8 commented 7 months ago

@uturuncoglu I completely agree. @josephzhang8 I submitted a PR for PaHM updates in SCHISM, could you please take a look at it? We tested it here at NOAA and it seems to be working as expected

Will do.. thx @pvelissariou1

josephzhang8 commented 7 months ago

@uturuncoglu , @josephzhang8 Ufuk did you have the opportunity to compile with WWM ON? This configuration was failing during compilation. I'll check on this later today from my side.

I'm able to compile with WWM on.

pvelissariou1 commented 7 months ago

Thank you Joseph. Let me try it within UFS-Coastal and I'll report back.

pvelissariou1 commented 7 months ago

@uturuncoglu , @josephzhang8 I compiled SCHISM standalone and in UFS-Coastal. The results are as follows (please read through):

In SCHISM standalone
--------------------
commit 148e16b4b732f1775819c593db3f33b3496bc2f4 (HEAD -> master, origin/master, origin/HEAD)
Author: Joseph Zhang <yjzhang@vims.edu>
Date:   Fri Feb 9 14:21:11 2024 -0500

a) COMPILATION COMMAND: cmake -S src -B build -DUSE_WWM=ON -DNO_PARMETIS=OFF -DOLDIO=ON

Produces errors like the ones shown next and the compilation fails:

/scratch2/STI/coastal_temp/save/Panagiotis.Velissariou/schism/src/WWMIII/wwm_datapl.F90(20): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [SCHISM_MSGP]
      use schism_msgp !, only: comm,             & ! MPI communicator
----------^
/scratch2/STI/coastal_temp/save/Panagiotis.Velissariou/schism/src/WWMIII/wwm_datapl.F90(23): error #7002: Error in opening the compiled module file.  Check INCLUDE paths.   [SCHISM_GLBL]
      use schism_glbl, only  : MNE => nea_wwm,       & ! Elements of the augmented domain
----------^
/scratch2/STI/coastal_temp/save/Panagiotis.Velissariou/schism/src/WWMIII/wwm_datapl.F90(125): error #6683: A kind type parameter must be a compile-time constant.   [RKIND]
         REAL(rkind), allocatable           :: nwild_gb(:)
--------------^
.
.
.

b) COMPILATION COMMAND: cmake -S src -B build -DUSE_WWM=ON -DNO_PARMETIS=OFF -DOLDIO=ON -DBLD_STANDALONE=ON

In this case the compilation is successful

In UFS-Coastal
--------------

a) COMPILATION COMMAND: compile.sh hera "-DAPP=CSTLS -DUSE_WWM=ON -DNO_PARMETIS=OFF -DOLDIO=ON" "coastalSWWM" intel YES NO

Produces errors like the ones shown next and the compilation fails:

/scratch2/STI/coastal_temp/save/Panagiotis.Velissariou/ufs-coastal-coastal_app-02092024/SCHISM-interface/SCHISM/src/WWMIII/wwm_parall_solver.F90(2214): error #6580: Name in only-list does not exist or is not accessible.   [EXCHANGE_P4D_WWM]
      USE datapool, only : exchange_p4d_wwm, myrank
---------------------------^
/scratch2/STI/coastal_temp/save/Panagiotis.Velissariou/ufs-coastal-coastal_app-02092024/SCHISM-interface/SCHISM/src/WWMIII/wwm_parall_solver.F90(2632): error #6580: Name in only-list does not exist or is not accessible.   [EXCHANGE_P4D_WWM]
      USE datapool, only : exchange_p4d_wwm, myrank
---------------------------^
/scratch2/STI/coastal_temp/save/Panagiotis.Velissariou/ufs-coastal-coastal_app-02092024/SCHISM-interface/SCHISM/src/WWMIII/wwm_parall_solver.F90(3315): error #6580: Name in only-list does not exist or is not accessible.   [EXCHANGE_P4D_WWM]
      USE datapool, only : myrank, exchange_p4d_wwm, WRITESTATFLAG
-----------------------------------^
compilation aborted for /scratch2/STI/coastal_temp/save/Panagiotis.Velissariou/ufs-coastal-coastal_app-02092024/SCHISM-interface/SCHISM/src/WWMIII/wwm_parall_solver.F90 (code 1)

b) COMPILATION COMMAND: compile.sh hera "-DAPP=CSTLS -DUSE_WWM=ON -DNO_PARMETIS=OFF -DOLDIO=ON -DBLD_STANDALONE=ON" "coastalSWWM" intel YES NO

In this case the compilation is successful, though the ufs_model executable is not found by the script as it is located in
build*/SCHISM-interface/SCHISM/src/bin/

NOTE: The SCHISM commit in UFS-Coastal is the same as in the standalone case (I replaced SCHISM in UFS-Coastal). The UFS executable contains SCHISM, CDEPS and CMEPS.


In UFS-Coastal coupled with WW3
-------------------------------

a) COMPILATION COMMAND: compile.sh hera "-DAPP=CSTLSW -DUSE_ATMOS=ON -DUSE_WW3=ON -DNO_PARMETIS=OFF -DOLDIO=ON -DPDLIB=ON" coastalSWW3 intel YES NO

In this case the compilation is successful

b) COMPILATION COMMAND: compile.sh hera "-DAPP=CSTLSW -DUSE_ATMOS=ON -DUSE_WW3=ON -DBLD_STANDALONE=ON -DNO_PARMETIS=OFF -DOLDIO=ON -DPDLIB=ON" coastalSWW3 intel YES NO

Again in this case the compilation was successful, though the ufs_model executable is not found by the script as it is located in
build*/SCHISM-interface/SCHISM/src/bin/

NOTE: The UFS executable contains SCHISM, WW3, CDEPS and CMEPS.


My honest opinion is that the compilation approach of SCHISM (standalone or in UFS-Coastal) is not exactly what we want. We might need to improve on this.

janahaddad commented 7 months ago

@uturuncoglu we can close, correct?

uturuncoglu commented 7 months ago

@janahaddad Lets's keep it open for now. I need to double check.

uturuncoglu commented 7 months ago

@josephzhang8 I have just tested recent changes in SCHSIM to fix the standalone build (https://github.com/schism-dev/schism/commit/eb562ebb4682690d5803ab3e3153085dbd1e4feb) and it is working under ufs-coastal side too. I run the RT and it passed. I'll sync out fork and update ufs-coastal then I'll close this ticket if this is also fine for you.

josephzhang8 commented 7 months ago

Thx Kijin, @uturuncoglu for your help! I've also tested on our system and the new changes in cmake are fine.

uturuncoglu commented 7 months ago

@josephzhang8 Okay. Great. I synced the SCHSIM under ufs-costal. @kjnam Thanks again for your help.

kjnam commented 7 months ago

@uturuncoglu @josephzhang8 Happy to be of help.

uturuncoglu commented 7 months ago

@josephzhang8 @kjnam I am opening this issue since coastal_ike_shinnecock_atm2sch2ww3 has started to fail. In this case, the only change is updating SCHISM with main (no change in SCHISM-ESMF side). Here is the details, I also checked the coastal_ike_shinnecock_atm2sch2ww3 case coupled with WW3 and it seems that the test is failing. In the PET logs I am seeing following warning,

20240216 230738.223 WARNING          PET08 OCN connected field eastward_wave_radiation_stress  not used with nws = 2 (needs nws = 3)
20240216 230738.223 INFO             PET08 OCN created array eastward_northward_wave_radiation_stress on all resident nodes
20240216 230738.223 INFO             PET08 OCN created field eastward_northward_wave_radiation_stress
20240216 230738.223 INFO             PET08 OCN added halo route to field eastward_northward_wave_radiation_stress
20240216 230738.223 INFO             PET08 OCN realized field eastward_northward_wave_radiation_stress
20240216 230738.223 WARNING          PET08 OCN connected field eastward_northward_wave_radiation_stress  not used with nws = 2 (needs nws = 3)
20240216 230738.223 INFO             PET08 OCN created array northward_wave_radiation_stress on all resident nodes
20240216 230738.223 INFO             PET08 OCN created field northward_wave_radiation_stress
20240216 230738.223 INFO             PET08 OCN added halo route to field northward_wave_radiation_stress
20240216 230738.223 INFO             PET08 OCN realized field northward_wave_radiation_stress
20240216 230738.223 WARNING          PET08 OCN connected field northward_wave_radiation_stress  not used with nws = 2 (needs nws = 3)

and also model crashes with following error,

20240216 230738.224 INFO             PET08 --- checking connection state of mesh_topology
20240216 230738.224 ERROR            PET08 /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.5.1/cache/build_stage/spack-stage-esmf-8.5.0-ffgsxo7vntsk3fr5lwbjnjrrviof6dz5/spack-src/src/Infrastructure/Base/src/ESMCI_Info.C:830 Info::get(): Key not found (JSON trace will follow): /NUOPC/Instance/Connected
20240216 230738.224 ERROR            PET08 ESMCI_Info.C:832 Info::get() Attribute not set  - [json.exception.out_of_range.403] key 'NUOPC' not found
20240216 230738.224 ERROR            PET08 ESMCI_Info.C:836 Info::get() Attribute not set  - Internal subroutine call returned Error
20240216 230738.224 ERROR            PET08 ESMC_InfoCDef.C:601 ESMC_InfoGetCH() Attribute not set  - Internal subroutine call returned Error
20240216 230738.224 ERROR            PET08 ESMF_Info.F90:986 ESMF_InfoGetCH() Attribute not set  - Internal subroutine call returned Error
20240216 230738.224 ERROR            PET08 src/addon/NUOPC/src/NUOPC_Base.F90:1047 Attribute not set  - Passing error in return code
20240216 230738.224 ERROR            PET08 src/addon/NUOPC/src/NUOPC_Base.F90:2901 Attribute not set  - Passing error in return code

I think that second error is due to the first one. I am not sure why coastal_ike_shinnecock_atm2sch2ww3 case gives warning about nws but coastal_ike_shinnecock_atm2sch is fine and there is no warning or issue in there. Do you have any idea?

Anyway, here are the details of configurations,

Anyway, the only extra flag for SCHSIM is -DUSE_WW3=ON in the failed case and only that one complains about nws. BTW, if I try to run the coastal_ike_shinnecock_atm2sch2ww3 case with the version before sync with main, it runs without any issue. Probably something changed in the last sync but not sure what.

uturuncoglu commented 7 months ago

@janahaddad I reopened this ticket since last sync is breaking the ww3 coupling. Maybe cap needs to be adjusted but not sure at this point. Still investigating. I will point the working hash (https://github.com/oceanmodeling/schism/commit/f9b42dbd682828c0e996510f28b40a453a2ea065) in ufs-coastal until we solve the issue.

uturuncoglu commented 7 months ago

@josephzhang8 I think there is a incompatibility issue between SCHSIM and SCHSIM-ESMF repository. Here is the line that prints out first WARNING message, https://github.com/schism-dev/schism-esmf/blob/9fc62b21afc1edcb691bef5ffa5b309375d5b07c/src/schism/schism_esmf_util.F90#L1127. It indicates that nws needs to be 3 for coupling but I think this is not true since we are using 2 for coupling. I also double checked the atm2sch and warning also exists over there. So, I don't think this is the root cause of the crash in atm2sch2ww3 but the cap still needs to be fixed since the PET logs has misleading warning message and needs to be removed. @platipodium do you agree with me?

uturuncoglu commented 7 months ago

@josephzhang8 @platipodium Okay. I seems that the issue was arising due to recent changes that brings wave related fields to the output. I created a new issue for it https://github.com/oceanmodeling/schism/issues/4. @platipodium I'll also create a PR to the NUOPC cap that fixes the Attribute not set issue (this is a warning actually but it is appearing as error but does not result crash. It seems that the message is marked wrongly in ESMF side) and also remove the misleading nsw message.

uturuncoglu commented 7 months ago

@josephzhang8 @platipodium I created a PR in upstream that fixes the couple of issues. https://github.com/schism-dev/schism-esmf/pull/29. I think I could close this issue but prefer to wait until PR is merged.

platipodium commented 7 months ago

@josephzhang8 @platipodium I created a PR in upstream that fixes the couple of issues. schism-dev/schism-esmf#29. I think I could close this issue but prefer to wait until PR is merged.

Merged upstream. THank you for the hotfix

josephzhang8 commented 7 months ago

@uturuncoglu The latest master has removed IMPOSE_NET_FLUX and 'meth_sink', but I don't think this is the reason. Check outputs/fatal.error to see if there is anything there.

josephzhang8 commented 7 months ago

@uturuncoglu The changes in param.nml are documented live inside src/Readme.beta_notes. In particular pay attention to 'remove' b/c of the way .nml works (you can omit parameters but cannot include those that are not in the list).

uturuncoglu commented 6 months ago

@josephzhang8 @platipodium As I know this issue is fixed. If you don't mind could you confirm. So, I could close the issue.

josephzhang8 commented 6 months ago

I think so. Thx @uturuncoglu !