sxs-collaboration / spectre

SpECTRE is a code for multi-scale, multi-physics problems in astrophysics and gravitational physics.
https://spectre-code.org
Other
162 stars 191 forks source link

CCE runs on CaltechHPC #3783

Open Sizheng-Ma opened 2 years ago

Sizheng-Ma commented 2 years ago

Bug reports:

Expected behavior:

Current behavior:

This issue is related to #3782. I'm trying to run the CharacteristicExtract on CaltechHPC. The CCE grid is as follows

Cce:
  LMax: 14
  NumberOfRadialPoints: 28

Running on a single node, the system proceeds 55M within 10mins. However, it suddenly slows down by a factor of 5 after I add one more radial grid point. When the radial points are more than 32, the system doesn't evolve at all. I haven't seen the same issue on other machines.

I’ve tested a few cases with radial points: 33,32,31,30,29,28,23,18. Within 10mins, they proceed 0M, 6.1M, 9.1M, 5.5M, 11M, 55M, 59M, 62M.

Environment:

Using all modules in caltech_hpc_gcc.sh

Detailed discussion:

markscheel commented 2 years ago

Can you run the same case in debug mode?

Sizheng-Ma commented 2 years ago

Can you run the same case in debug mode?

Nothing gets printed.

Sizheng-Ma commented 2 years ago

The submission script:

#!/bin/bash -
#SBATCH -o spectre.stdout
#SBATCH -e spectre.stderr
#SBATCH --ntasks-per-node 32
#SBATCH -J KerrSchild
#SBATCH --nodes 1
#SBATCH -p any
#SBATCH -t 00:10:00
#SBATCH -D .
#SBATCH -A sxs
#SBATCH --mem=92GB

# Replace these paths with the path to your build directory, to the source root
# directory, the spectre dependencies module directory, and to the directory
# where you want the output to appear, i.e. the run directory.
# E.g., if you cloned spectre in your home directory, set
# SPECTRE_BUILD_DIR to ${HOME}/spectre/build. If you want to run in a
# directory called "Run" in the current directory, set
# SPECTRE_RUN_DIR to ${PWD}/Run
export SPECTRE_BUILD_DIR=${HOME}/spectre/build/
export SPECTRE_MODULE_DIR=${HOME}/DEPS_new/modules/
export SPECTRE_HOME=${HOME}/spectre/
export SPECTRE_RUN_DIR=${PWD}/Run

# Choose the executable and input file to run
# To use an input file in the current directory, set
# SPECTRE_INPUT_FILE to ${PWD}/InputFileName.yaml
export SPECTRE_EXECUTABLE=${PWD}/CharacteristicExtract
export SPECTRE_INPUT_FILE=${PWD}/KerrSchildWithCce.yaml

# These commands load the relevant modules and cd into the run directory,
# creating it if it doesn't exist
source ${SPECTRE_HOME}/support/Environments/caltech_hpc_gcc.sh
module use ${SPECTRE_MODULE_DIR}
spectre_load_modules
module list

mkdir -p ${SPECTRE_RUN_DIR}
cd ${SPECTRE_RUN_DIR}

# Copy the input file into the run directory, to preserve it
cp ${SPECTRE_INPUT_FILE} ${SPECTRE_RUN_DIR}/

# Set desired permissions for files created with this script
umask 0022

# Set the path to include the build directory's bin directory
export PATH=${SPECTRE_BUILD_DIR}/bin:$PATH

# Generate the nodefile
echo "Running on the following nodes:"
echo ${SLURM_NODELIST}
touch nodelist.$SLURM_JOBID
for node in $(echo $SLURM_NODELIST | scontrol show hostnames); do
  echo "host ${node}" >> nodelist.$SLURM_JOBID
done

WORKER_THREADS_PER_NODE=$((SLURM_NTASKS_PER_NODE - 1))
WORKER_THREADS=$((SLURM_NPROCS - SLURM_NNODES))
SPECTRE_COMMAND="${SPECTRE_EXECUTABLE} +isomalloc_sync  ++np ${SLURM_NNODES} \
++p ${WORKER_THREADS} ++ppn ${WORKER_THREADS_PER_NODE} \
++nodelist nodelist.${SLURM_JOBID}"

# When invoking through `charmrun`, charm will initiate remote sessions which
# will wipe out environment settings unless it is forced to re-initialize the
# spectre environment between the start of the remote session and starting the
# spectre executable
echo "#!/bin/sh
source /home/sma/spectre/support/Environments/caltech_hpc_gcc.sh
module use ${SPECTRE_MODULE_DIR}
spectre_load_modules
\$@
" > runscript

chmod u+x ./runscript

charmrun ++runscript ./runscript ${SPECTRE_COMMAND} \
         --input-file ${SPECTRE_INPUT_FILE}

The input file

Evolution:
  #InitialTime: 0.0
  InitialTimeStep: 0.1
  InitialSlabSize: 0.1
  #TimeStepper: RungeKutta3
  #  AdamsBashforthN:
  #    Order: 3

Observers:
  VolumeFileName: "GhKerrSchildVolume"
  ReductionFileName: "GhKerrSchildReductions"

Cce:
  Evolution:
    TimeStepper:
      AdamsBashforthN:
        Order: 3
    StepChoosers:
      - Constant: 1.0
      - Increase:
          Factor: 2
      - ErrorControl(SwshVars):
          AbsoluteTolerance: 1e-8
          RelativeTolerance: 1e-6
          MaxFactor: 2
          MinFactor: 0.25
          SafetyFactor: 0.9
      - ErrorControl(CoordVars):
          AbsoluteTolerance: 1e-8
          RelativeTolerance: 1e-7
          MaxFactor: 2
          MinFactor: 0.25
          SafetyFactor: 0.9
    StepController:
      BinaryFraction

  StartTime: Auto
  EndTime: Auto
  BoundaryDataFilename: "/central/groups/sxs/sma/cce_bh/no_id_new/test/c2/BondiCceR0198.h5"
  LMax: 14
  ExtractionRadius: Auto
  NumberOfRadialPoints: 39
  ObservationLMax: 4

  InitializeJ:
    InverseCubic

  Filtering:
    RadialFilterHalfPower: 24
    RadialFilterAlpha: 35.0
    FilterLMax: 12

  ScriInterpOrder: 5
  ScriOutputDensity: 1

  H5Interpolator:
    BarycentricRationalSpanInterpolator:
      MinOrder: 10
      MaxOrder: 10

  H5LookaheadTimes: 200
  H5IsBondiData: True
  FixSpecNormalization: False

Note that the run also needs a worldtube data. It can be found at /central/groups/sxs/sma/cce_bh/no_id_new/test/c2/BondiCceR0198.h5, which corresponds to a static Schwarzschild BH generated by SpEC.

The SpECTRE I'm using can be found at https://github.com/Sizheng-Ma/spectre/tree/cce_gh_executable_gh_gts

kidder commented 2 years ago

okay, I find that CharacteristicExtract runs very quickly on one core and slows down significantly on more than one core. The problem is exacerbated as the radial resolution is increased. @moxcodes did note in his tutorial that CCE does not scale beyond 4 cores, but going from one to two cores significantly slows down for larger of NumberOfRadialPoints

kidder commented 2 years ago

@nilsdeppe is cleaning up a branch to do profiling so we can investigate this further