ufs-community / ufs-mrweather-app

UFS Medium-Range Weather Application
Other
23 stars 23 forks source link

Error creating new case on Redhat and Ubuntu Linux (using AWS) #211

Closed climbfuji closed 4 years ago

climbfuji commented 4 years ago

On both Redhat 8 and Ubuntu 18, I get

ubuntu@ip-172-31-55-49:~/sandpit/ufs-mrweather-app$ ./cime/scripts/create_newcase --case $UFS_SCRATCH/ufs-mrweather-app-workflow.c96 --compset GFSv15p2 --res C96 --workflow ufs-mrweather --machine linux
Compset longname is FCST_ufsatm%v15p2_SLND_SICE_SOCN_SROF_SGLC_SWAV
Compset specification file is /home/ubuntu/sandpit/ufs-mrweather-app/src/model/FV3/cime/cime_config/config_compsets.xml
Automatically adding SESP to compset
Compset forcing is
ATM component is UFSATM Atmosphere with:CCPP physics version 15p2
LND component is Stub land component
ICE component is Stub ice component
OCN component is Stub ocn component
ROF component is Stub river component
GLC component is Stub glacier (land ice) component
WAV component is Stub wave component
ESP component is Stub external system processing (ESP) component
Pes     specification file is /home/ubuntu/sandpit/ufs-mrweather-app/src/model/FV3/cime/cime_config/config_pes.xml
Compset specific settings: name is RUN_STARTDATE and value is 2019-08-29
Compset specific settings: name is START_TOD and value is 0
Compset specific settings: name is COMP_CLASSES and value is ATM
Compset specific settings: name is CHECK_TIMING and value is FALSE
Could not find machine match for 'ip-172-31-55-49.ec2.internal' or 'ip-172-31-55-49'
Machine is linux
mach_choice linux mach_match linux
grid_choice a%C96 grid_match a%C96
compset_choice ufsatm compset_match ufsatm
pesize_choice any pesize_match any
points = 12
ERROR: More than one PE layout matches given PE specs

I am using x2.large instances with four cores.

ligiabernardet commented 4 years ago

@climbfuji Did this use to work with previous code? When did it last work? @uturuncoglu Can you please take a look at this?

climbfuji commented 4 years ago

@climbfuji Did this use to work with previous code? When did it last work? @uturuncoglu Can you please take a look at this?

Maybe it has to do with the node type I chose this time, it's got four CPUs - in any case the app must be able to handle situations where a user has four CPUs.

uturuncoglu commented 4 years ago

@climbfuji Could you try following? If it works I could make the changes in the interface. BTW, the current default is 8 core for linux and I think we decide like that in 1.0 release. Also, I think there is no way to run the model less than 6 core. What would be the layout combination for it? Anyway, if you want to run with 8 core,

  <grid name="a%C96">
    <mach name="linux">
      <pes pesize="any" compset="ufsatm">
        <comment>none</comment>
        <ntasks>
          <ntasks_atm>8</ntasks_atm>
        </ntasks>
        <nthrds>
          <nthrds_atm>1</nthrds_atm>
        </nthrds>
      </pes>
    </mach>
  </grid>

so, you need to remove one of them. I will remove those duplicate entries if you have successful run.

climbfuji commented 4 years ago

@climbfuji Could you try following? If it works I could make the changes in the interface. BTW, the current default is 8 core for linux and I think we decide like that in 1.0 release. Also, I think there is no way to run the model less than 6 core. What would be the layout combination for it? Anyway, if you want to run with 8 core,

  • go to src/model/FV3/cime/cime_config
  • edit config_pes.xml
  • there are two entry for following
  <grid name="a%C96">
    <mach name="linux">
      <pes pesize="any" compset="ufsatm">
        <comment>none</comment>
        <ntasks>
          <ntasks_atm>8</ntasks_atm>
        </ntasks>
        <nthrds>
          <nthrds_atm>1</nthrds_atm>
        </nthrds>
      </pes>
    </mach>
  </grid>

so, you need to remove one of them. I will remove those duplicate entries if you have successful run.

Thanks, @uturuncoglu I will try this. Maybe it's got nothing to do with the number of cores on the node after all, but just with this duplicate entry. On macOS, we only have two cores but still run 6 MPI tasks on them. Similar, on Linux, we simply oversubscribe if needed.

climbfuji commented 4 years ago

@climbfuji Could you try following? If it works I could make the changes in the interface. BTW, the current default is 8 core for linux and I think we decide like that in 1.0 release. Also, I think there is no way to run the model less than 6 core. What would be the layout combination for it? Anyway, if you want to run with 8 core,

  • go to src/model/FV3/cime/cime_config
  • edit config_pes.xml
  • there are two entry for following
  <grid name="a%C96">
    <mach name="linux">
      <pes pesize="any" compset="ufsatm">
        <comment>none</comment>
        <ntasks>
          <ntasks_atm>8</ntasks_atm>
        </ntasks>
        <nthrds>
          <nthrds_atm>1</nthrds_atm>
        </nthrds>
      </pes>
    </mach>
  </grid>

so, you need to remove one of them. I will remove those duplicate entries if you have successful run.

Thanks, @uturuncoglu I will try this. Maybe it's got nothing to do with the number of cores on the node after all, but just with this duplicate entry. On macOS, we only have two cores but still run 6 MPI tasks on them. Similar, on Linux, we simply oversubscribe if needed.

@uturuncoglu removing the duplicate entry solves this particular problem, I was able to create the case. Now proceeding.

uturuncoglu commented 4 years ago

@climbfuji okay. that is great. I'll push the change to my branch and it will be available in the final PR.