mom-ocean / MOM5

The Modular Ocean Model
https://mom-ocean.github.io/
GNU Lesser General Public License v3.0
82 stars 96 forks source link

MOM5/CM2M (experiment CM2.1p1 run error ) #369

Open Mubashardogar opened 2 years ago

Mubashardogar commented 2 years ago

Dear MOM5 Users,

I have successfully compiled the MOM5 model for type CM2M using Hokkaido University's supercomputer with ifort compiler environment (https://www.hucc.hokudai.ac.jp/en/supercomputer/sc-overview/). The environment (environs.hu) and mkmf.template.hu file that was used for model compilation are attached. I used compiler Flag "-convert big_endian" to get rid of model run error which I also get like several MOM users which were reported in earlier email threads (i.e., FATAL from PE 0: read_fv_rst:: resolution inconsistent). However, I got the following error when I tried to run the test experiment CM2.1p1 using npes 45 (i.e., ocean_npes=30, atmos_npes=15). I used two HPC nodes of the Hokkaido University Grand Chariot supercomputer (each node has 40 cores). Please suggest how to overcome this model run error.

FATAL from PE 34: ==>Error from coupler_types_mod (CT_spawn_1d_3d): Disordered k-dimension index bound list 1 0 Best regards, Dogar

environs.hu.txt ERROR_File.sh..o13.txt mkmf.template.hu.txt

russfiedler commented 2 years ago

Thanks for that. It looks like it's crashing during the initial testing steps. Could you recompile and run with traceback information so we can see where the error is being triggered?

-g -traceback should do the trick.

Mubashardogar commented 2 years ago

Dear Russ,

Thank you so much. Where do I need to add the option "-g -traceback"? should it be in the compilation or in the runscript (MOM_run.csh)?

Best regards? Dogar

On Mon, Jul 18, 2022 at 2:46 PM russfiedler @.***> wrote:

Thanks for that. It looks like it's crashing during the initial testing steps. Could you recompile and run with traceback information so we can see where the error is being triggered?

-g -traceback should do the trick.

— Reply to this email directly, view it on GitHub https://github.com/mom-ocean/MOM5/issues/369#issuecomment-1186791662, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2DNFN774HAMA6GJRDCT5XTVUTVUZANCNFSM5326GZGA . You are receiving this because you authored the thread.Message ID: @.***>

-- Muhammad Mubashar Dogar Scientific Officer (Climatology Section), Global Change Impact Studies Centre (GCISC) Ministry of Climate Change (MoCC) 6th Floor, Emigration Tower, G 8/1, Islamabad, Pakistan email: @. @.

russfiedler commented 2 years ago

In the compilation template script. Make sure you you clean out the old objects and binaries first. At the end of your output file you should be getting some routines and line numbers rather than just addresses if you've done it correctly.

Mubashardogar commented 2 years ago

Dear Russ,

If I understand it correctly, you want me to include "-g -traceback" option in the CFlag in my mkmf.template file and then recompile the model? I am sorry for asking this basic question.

Best regards? Dogar

On Mon, Jul 18, 2022 at 3:27 PM russfiedler @.***> wrote:

In the compilation script. Make sure you you clean out the old objects and binaries first. At the end of your output file you should be getting some routines and line numbers rather than just addresses if you've done it correctly.

— Reply to this email directly, view it on GitHub https://github.com/mom-ocean/MOM5/issues/369#issuecomment-1186814812, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2DNFNYTZNSB3DDSTQTXYLLVUT2LVANCNFSM5326GZGA . You are receiving this because you authored the thread.Message ID: @.***>

-- Muhammad Mubashar Dogar Scientific Officer (Climatology Section), Global Change Impact Studies Centre (GCISC) Ministry of Climate Change (MoCC) 6th Floor, Emigration Tower, G 8/1, Islamabad, Pakistan email: @. @.

russfiedler commented 2 years ago

Also in FFLAGS. LDFLAGS shouldn't need it but it won't hurt.

Mubashardogar commented 2 years ago

Dear Russ,

Thank you. I compiled the model by adding -g -traceback option as advised. Before running the model, I want to share that I get the following at the end of the compilation. I want to know if there are specific line numbers and routines that should be displayed (i.e., routines and line numbers rather than just addresses) that you mentioned in an earlier email. Does it look fine or those will be displayed in the model error file during the running of the model?

Using 8-byte addressing Using PURE Converting pointers to allocatable components ar rv lib_land_lad.a numerics.o land_model.o land_properties.o rivers.o climap_albedo.o soil.o land_types.o vegetation.o ar: creating lib_land_lad.a a - numerics.o a - land_model.o a - land_properties.o a - rivers.o a - climap_albedo.o a - soil.o a - land_types.o a - vegetation.o ..... Makefile is ready. mpiifort -Duse_netCDF -Duse_netCDF -Duse_libMPI -DUSE_OCEAN_BGC -DENABLE_ODA -DSPMD -DLAND_BND_TRACERS -xCORE-AVX512 -qopenmp -O2 -i4 -r8 -nowarn -convert big_endian -g -traceback -I/home/t23598/dogarm/MOM5_Copy/MOM5/src/shared/include -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_FMS -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_ocean -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_version/ -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_ice -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_atmos_fv -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_atmos_phys -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_land_lad -c /home/t23598/dogarm/MOM5_Copy/MOM5/src/coupler/surface_flux.F90 mpiifort -Duse_netCDF -Duse_netCDF -Duse_libMPI -DUSE_OCEAN_BGC -DENABLE_ODA -DSPMD -DLAND_BND_TRACERS -xCORE-AVX512 -qopenmp -O2 -i4 -r8 -nowarn -convert big_endian -g -traceback -I/home/t23598/dogarm/MOM5_Copy/MOM5/src/shared/include -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_FMS -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_ocean -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_version/ -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_ice -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_atmos_fv -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_atmos_phys -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_land_lad -c /home/t23598/dogarm/MOM5_Copy/MOM5/src/coupler/flux_exchange.F90 mpiifort -Duse_netCDF -Duse_netCDF -Duse_libMPI -DUSE_OCEAN_BGC -DENABLE_ODA -DSPMD -DLAND_BND_TRACERS -xCORE-AVX512 -qopenmp -O2 -i4 -r8 -nowarn -convert big_endian -g -traceback -I/home/t23598/dogarm/MOM5_Copy/MOM5/src/shared/include -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_FMS -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_ocean -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_version/ -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_ice -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_atmos_fv -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_atmos_phys -I/home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_land_lad -c /home/t23598/dogarm/MOM5_Copy/MOM5/src/coupler/coupler_main.F90 mpiifort flux_exchange.o coupler_main.o surface_flux.o -o fms_CM2M.x /home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_ocean/lib_ocean.a /home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_ice/lib_ice.a /home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_atmos_fv/lib_atmos_fv.a /home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_atmos_phys/lib_atmos_phys.a /home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_land_lad/lib_land_lad.a /home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_version/lib_version.a /home/t23598/dogarm/MOM5_Copy/MOM5/exec/hu/lib_FMS/lib_FMS.a -lhdf5_hl -lhdf5 -lhdf5_fortran -lhdf5hl_fortran -lnetcdff -lnetcdf -qopenmp -lpthread -g -traceback

Best regards, Dogar

On Mon, Jul 18, 2022 at 3:56 PM russfiedler @.***> wrote:

Also in FFLAGS. LDFLAGS shouldn't need it but it won't hurt.

— Reply to this email directly, view it on GitHub https://github.com/mom-ocean/MOM5/issues/369#issuecomment-1186832936, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2DNFN6EDJTTJ2TE4HSAZQ3VUT52FANCNFSM5326GZGA . You are receiving this because you authored the thread.Message ID: @.***>

-- Muhammad Mubashar Dogar Scientific Officer (Climatology Section), Global Change Impact Studies Centre (GCISC) Ministry of Climate Change (MoCC) 6th Floor, Emigration Tower, G 8/1, Islamabad, Pakistan email: @. @.

russfiedler commented 2 years ago

Yes, that looks good. I'll have a look at your results tomorrow.

Mubashardogar commented 2 years ago

Dear Russ,

Thank you so much. Please find attached the error file. Best regards, Dogar

On Mon, Jul 18, 2022 at 4:53 PM russfiedler @.***> wrote:

Yes, that looks good. I'll have a look at your results tomorrow.

— Reply to this email directly, view it on GitHub https://github.com/mom-ocean/MOM5/issues/369#issuecomment-1186881448, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2DNFN7PVMUQNDFTYCC5DW3VUUEPLANCNFSM5326GZGA . You are receiving this because you authored the thread.Message ID: @.***>

-- Muhammad Mubashar Dogar Scientific Officer (Climatology Section), Global Change Impact Studies Centre (GCISC) Ministry of Climate Change (MoCC) 6th Floor, Emigration Tower, G 8/1, Islamabad, Pakistan email: @. @.

russfiedler commented 2 years ago

@Mubashardogar unfortunately, there doesn't seem to be an attachement.

Mubashardogar commented 2 years ago

Dear Russ,

Please find attached the error file. Best regards, Dogar CM2P1run_ERROR.sh.o1319576.txt

russfiedler commented 2 years ago

That's great. @aidanheerdegen It seems like an array with a zero length dimension isn't being handled correctly in the coupler for gas exchange. Something that might be a 2D array is being treated as 3D.

Mubashardogar commented 2 years ago

Dear Russ,

Did you manage and figured out the problem. Kindly advise how to fix this error? I look forward to your kind response.

Thank you and best regards Dogar

aidanheerdegen commented 2 years ago

I am looking into it.

aidanheerdegen commented 2 years ago

I don't think I will have time to get resolution today, or this week.

In the meantime @Mubashardogar you could try checking out this commit https://github.com/mom-ocean/MOM5/commit/fe8bdad8273fdbffe2f764f9b139c6cc5197a988

git checkout fe8bdad82

and compile and use that executable. The change in the code that is throwing the error was an update to FMS just after that commit. I doubt anything else that has changed since then is critical for you, considering you are running an old standard configuration.

Mubashardogar commented 2 years ago

Dear Aidan,

I understand that it will take time to fix. However, meanwhile, if I understood correctly, you want me to take an older version of MOM5 (before this update to FMS) and use this one and compile the model again. Could you kindly give the download link directly pointing to this older version, so that I should not do any mistakes while downloading the version you are referring to?

Best regards, Dogar

aidanheerdegen commented 2 years ago

@russfiedler I reproduced the error on gadi.

So this check in the FMS update https://github.com/mom-ocean/MOM5/blob/master/src/shared/coupler/coupler_types.F90#L1342-L1344

doesn't exist in the previous version: https://github.com/mom-ocean/MOM5/blob/5f70c21ba4fd1ac59b6d423eb060f13251a31cdf/src/shared/coupler/coupler_types.F90#L1172-L1182

CT_spawn_1d_3d is called here: https://github.com/mom-ocean/MOM5/blob/master/src/shared/coupler/coupler_types.F90#L1000

call CT_spawn_1d_3d(var_in, var_out,  (/ is, is, ie, ie /), (/ js, js, je, je /), (/1, kd/), suffix)

with kdim = (/1, kd/)which implies 1 > kd.

That value of kd comes from the size of the 3rd dimension of the Ice%ice_mask

https://github.com/mom-ocean/MOM5/blob/master/src/coupler/flux_exchange.F90#L1066-L1067

    kd = size(Ice%ice_mask,3)
    call coupler_type_copy(ex_gas_fields_ice, Ice%ocean_fields, is, ie, js, je, kd,     &
         'ice_flux', Ice%axes, Time, suffix = '_ice')

Any ideas why Ice%ice_mask might have a zero-sized third dimension at this point?

russfiedler commented 2 years ago

@aidanheerdegen It looks like an optimisation/scope problem and horrible use of global variables. km isn't initialised until the call to set_ice_grid at line 505 in ice_grid.F90. This change to the value of km isn't seen by the compiler and I bet the allocations beginning at line 411 in ice_model_init are being done out of order. @Mubashardogar First, try compiling without the OpenMP compiler flags. They may be causing the problem. Otherwise add the line km=num_part before the calls to set_ocean_grid

aidanheerdegen commented 2 years ago

I understand that it will take time to fix. However, meanwhile, if I understood correctly, you want me to take an older version of MOM5 (before this update to FMS) and use this one and compile the model again. Could you kindly give the download link directly pointing to this older version, so that I should not do any mistakes while downloading the version you are referring to?

If you cloned the MOM5 repo, running the command I gave above in your MOM5 code directory should be sufficient for your requirements.

aidanheerdegen commented 2 years ago

@aidanheerdegen It looks like an optimisation/scope problem and horrible use of global variables. km isn't initialised until the call to set_ice_grid at line 505 in ice_grid.F90. This change to the value of km isn't seen by the compiler and I bet the allocations beginning at line 411 in ice_model_init are being done out of order.

Ahh, I see. This broadcast take care of propagating the values from the ice model initialisation to other ice PEs https://github.com/mom-ocean/MOM5/blob/af3a94d40f21a4b7fd925d13b928f8721ad7d4c8/src/coupler/coupler_main.F90#L1352-L1353

before the problematic call to flux_exchange_init

https://github.com/mom-ocean/MOM5/blob/af3a94d40f21a4b7fd925d13b928f8721ad7d4c8/src/coupler/coupler_main.F90#L1362

but there is no synchronisation because those broadcasts are only in within each domain (ice and ocean).

Is that a bug? If the ocean PEs need information from the ice domain then it needs to broadcast that info to the ocean PEs.

Mubashardogar commented 2 years ago

@russfiedler Dear Russ, I compiled the model again by removing all compiler flags, i.e., "-qopenmp" from my mkmf.template file, however, I again get the error. Should I add km=num_part in the file "src/mom5/ocean_core/ocean_grids.F90 at Line 241 after subroutine "set_ocean_grid_size(Grid, grid_file, grid_name)"?

Best regards, Dogar

Mubashardogar commented 2 years ago

@aidanheerdegen, I followed your steps and compiled the model again after applying the command "git checkout fe8bdad82". Now I run the model again. This time model reached the end and displayed the message "end_of_run" as shown in the attached file. However, several errors and warnings are listed in this output file. Moreover, there is no output data *tar files (containing History and Ascii files, etc) produced. Is it because some input data files are missing? Did I miss some steps? Best regards, Dogar

CM2P1run.sh.o1323202.txt

aidanheerdegen commented 2 years ago

The model ran fine, but the runtime is only very short (21s) for testing purposes. You will probably need to increase the run length before you get any diagnostic output, as that is generally done at a frequency of daily, monthly and/or annually.

The output files are netCDF, so you will have files ending in .nc.

Mubashardogar commented 2 years ago

@aidanheerdegen Dear Aidan, Thank you so much. Ok, I will increase the test run length. Should I change the date in input.nml by changing the date (current_date =1,1,1,0,0,0,) or does it have to be done in ../bin/time_stamp.csh file? I couldn't find end date and number of submissions, etc. Also, please guide me if I want to increase the number of processors/cores from npes=45 to npes=60 then should I use the layout for ocean_npes=30 and atmos_pes=30?

What are the warnings and potential error messages (e.g., diag_manager_end: total_ocean_evap NOT available) in the output model run file that I attached earlier (messages are also copied below). Moreover, where can I get the input data (e.g., aerosol data especially volcanic aerosol input forcing data) as I am interested to do realistic simulations for the period 1950-2021, etc?

NOTE from PE 0: aerosol_mod: inconsistent nml settings -- not using aerosol timeseries but requesting interannual variation of aerosol amount for so4_anthro -- this aerosol will NOT exhibit interannual variation WARNING from PE 15: diag_util_mod::opening_file: module/field_name (generic_cfc/sfc_flux_cfc_12) NOT registered NOTE from PE 0: Potential error in diag_manager_end: drag_moist NOT available, check if output interval > runlength. Netcdf fill_values are written WARNING from PE 15: diag_util_mod::opening_file: module/field_name (ocean_model/eta_nonsteric_global) NOT registered NOTE from PE 15: Potential error in diag_manager_end: total_ocean_evap NOT available, check if output interval > runlength. Netcdf fill_values are written /bin/ls: No match.

Best regards, Dogar

aidanheerdegen commented 2 years ago

Issues in this repository are for code related problems only. There is ample documentation on running the configuring the model here

https://mom-ocean.github.io

If you have problems after that the google group is probably the best option.

Mubashardogar commented 1 year ago

Dear @russfiedler @aidanheerdegen,

I did an experiment using the MOM5/CM2.1 model that is a continuation of my earlier experiment. Just to remind you, I followed the above steps recommended by @aidanheerdegen and compiled the model after applying the command "git checkout https://github.com/mom-ocean/MOM5/commit/fe8bdad8273fdbffe2f764f9b139c6cc5197a988". My model was running fine with control settings.

Now, I want to see the effect of volcanic aerosols. Therefore, I made the required changes in the namelist "&aerosolrad_package_nml" (please see attached namelist "input.nml.txt"). Also please look at the log file and error file. In the error file I got the following message:

FATAL from PE 12: shortwave_driver_mod: cannot calculate volcanic sw heating when volcanic sw aerosols are not activated

Where should I activate volcanic sw aerosols? I have one more question. In the "&aerosolrad_package_nml", I activated "sw" and "lw" volcanic aerosols as follows, but it seems the model is not calculating it. Please advise, on how to fix these issues?

&aerosolrad_package_nml volcanic_dataset_entry = 1991, 1, 1, 0, 0, 0, using_volcanic_lw_files = .true., lw_ext_filename = "extlw_data.nc" lw_ext_root = "extlw" lw_asy_filename = "asmlw_data.nc" lw_asy_root = "asmlw " lw_ssa_filename = "omglw_data.nc" lw_ssa_root = "omglw" using_volcanic_sw_files = .true., sw_ext_filename = "extsw_data.nc" sw_ext_root = "extsw" sw_ssa_filename = "omgsw_data.nc" sw_ssa_root = "omgsw" sw_asy_filename = "asmsw_data.nc" sw_asy_root = "asmsw" do_lwaerosol = .true., do_swaerosol = .true., aerosol_data_set = 'shettle_fenn', optical_filename = "aerosol.optical.dat",

Best regards, Dogar CM2P1run_ERROR.sh.o1509435.txt input.nml.txt logfile.000000.out.txt