simbarras / tb23-gpu-opt-celeritas

Miror of my documentation repository for my bachelor thesis
https://gitlab.forge.hefr.ch/frederic.bapst/tb23-gpu-opt-celeritas
0 stars 0 forks source link

Run the profile on zeus #27

Closed simbarras closed 1 year ago

simbarras commented 1 year ago

After #26

simbarras commented 1 year ago

Run code on Zeus

To launch launch code on zeus use an adaptation of the wildstyle. It only works with the build without assertion (ndebug) and if the folder Celeritas and regression are in the same directory.

cd regression
./run-zeus.sh

Error with assertion

If I run another version of code than "ndebug" I receive this error:

Running on zeus.lbl.gov at Tue Jun 20 09:26:02 PDT 2023
===============================================================================
Running problem 1 of 2: cms2018-vecgeom-cpu...
0: awaiting communcation
0: exit code 1
Couldn't summarize system: missing key 'system'
{'input': None,
 'name': ('cms2018', 'vecgeom', 'cpu'),
 'result': [{'exception': {'context': 'result',
                           'str': "'time'",
                           'type': "<class 'KeyError'>"},
             'failure': {'condition': 'celeritas::device()',
                         'file': '/bld4/home/simbarras/project/celeritas/src/corecel/sys/Device.cc',
                         'line': 382,
                         'type': 'DebugError',
                         'which': 'precondition failed'}}],
 'system': []}
Elapsed time for cms2018-vecgeom-cpu: 0.1 (total: 0)
===============================================================================
Running problem 2 of 2: cms2018+field+msc-vecgeom-cpu...
0: awaiting communcation
0: exit code 1
Couldn't summarize system: missing key 'system'
{'input': None,
 'name': ('cms2018+field+msc', 'vecgeom', 'cpu'),
 'result': [{'exception': {'context': 'result',
                           'str': "'time'",
                           'type': "<class 'KeyError'>"},
             'failure': {'condition': 'celeritas::device()',
                         'file': '/bld4/home/simbarras/project/celeritas/src/corecel/sys/Device.cc',
                         'line': 382,
                         'type': 'DebugError',
                         'which': 'precondition failed'}}],
 'system': []}
Elapsed time for cms2018+field+msc-vecgeom-cpu: 0.1 (total: 0)
Wrote summaries to /bld4/home/simbarras/project/regression/results/zeusstyle
Completed at Tue Jun 20 09:26:02 PDT 2023
simbarras commented 1 year ago

Profile on zeus

To launch a profile run the following command of Nvidia: ncu -o profile --target-processes all "ExecFile". Unfortunately, it says that no kernels have been profiled.

ncu -o profile --target-processes all ./run-zeus.sh

Running on zeus.lbl.gov at Tue Jun 20 15:03:07 PDT 2023
===============================================================================
Running problem 1 of 2: cms2018-vecgeom-cpu...
0: awaiting communcation
==PROF== Target process 293645 terminated before first instrumented API call.
==PROF== Connected to process 293647 (/bld4/home/simbarras/project/celeritas/build-ndebug/bin/celer-sim)
==PROF== Disconnected from process 293647
0: success
{'input': {'enable_msc': False,
           'geometry_filename': 'cms2018.gdml',
           'mag_field': None,
           'num_track_slots': 4096,
           'use_device': False},
 'name': ('cms2018', 'vecgeom', 'cpu'),
 'result': [{'action_times': {'along-step-general-linear': 60.618753608999235,
                              'along-step-neutral': 105.5287278220001,
                              'annihil-2-gamma': 1.4263713359999899,
                              'brems-rel': 0.9165693729999989,
                              'brems-sb': 15.855017272999936,
                              'conv-bethe-heitler': 3.0402688919999994,
                              'extend-from-primaries': 0.018025936999999017,
                              'extend-from-secondaries': 10.95150989100007,
                              'geo-boundary': 56.857236687999695,
                              'initialize-tracks': 10.413339748999977,
                              'ioni-moller-bhabha': 1.3485006619999989,
                              'photoel-livermore': 13.263060132000238,
                              'physics-discrete-select': 16.093574735000047,
                              'pre-step': 164.7882286409988,
                              'scat-klein-nishina': 9.471062748000076,
                              'scat-rayleigh': 2.4067804220000273},
             'active_hwm': {'count': 4096, 'index': 106089},
             'avg_steps_per_primary': 47733.382967032965,
             'avg_time_per_primary': 0.05199944178186813,
             'avg_time_per_step': 1.089372647603492e-06,
             'emptying_step': 106090,
             'num_events': 7,
             'num_primaries': 9100,
             'num_step_iters': 107005,
             'num_steps': 434373785,
             'pre_emptying_time': 0.004330396,
             'queue_hwm': {'count': 405193, 'index': 701},
             'setup_time': 30.349529215,
             'slot_occupancy': 0.9910591781086456,
             'total_time': 473.194920215,
             'unconverged': 0}],
 'system': {'debug': False,
            'geant4': '11.0.4',
            'occupancy': {},
            'vecgeom': '1.2.2',
            'version': '0.3.0-dev.160+cfc407b0'}}
Elapsed time for cms2018-vecgeom-cpu: 504.5 (total: 504)
===============================================================================
Running problem 2 of 2: cms2018+field+msc-vecgeom-cpu...
0: awaiting communcation
==PROF== Connected to process 294036 (/bld4/home/simbarras/project/celeritas/build-ndebug/bin/celer-sim)
Timed out after 600.0 seconds: sending interrupt
==PROF== Disconnected from process 294036
0: success
{'input': {'enable_msc': True,
           'geometry_filename': 'cms2018.gdml',
           'mag_field': [0.0, 0.0, 1.0],
           'num_track_slots': 4096,
           'use_device': False},
 'name': ('cms2018+field+msc', 'vecgeom', 'cpu'),
 'result': [{'action_times': {'along-step-neutral': 92.16368571399956,
                              'along-step-uniform-msc': 249.28573778599787,
                              'annihil-2-gamma': 0.9451890520000022,
                              'brems-rel': 0.6567738839999981,
                              'brems-sb': 8.868275839000074,
                              'conv-bethe-heitler': 1.7982603700000013,
                              'extend-from-primaries': 0.015070036000000354,
                              'extend-from-secondaries': 6.819236777999958,
                              'geo-boundary': 53.080884426000004,
                              'initialize-tracks': 6.519707357000065,
                              'ioni-moller-bhabha': 0.9658400109999912,
                              'photoel-livermore': 7.546174925000011,
                              'physics-discrete-select': 9.163355675000023,
                              'pre-step': 126.04717180299895,
                              'scat-klein-nishina': 5.272644965000028,
                              'scat-rayleigh': 1.4657519419999971},
             'active_hwm': {'count': 4096, 'index': 85087},
             'avg_steps_per_primary': 38298.95032967033,
             'avg_time_per_primary': 0.06272290866956044,
             'avg_time_per_step': 1.6377187397997375e-06,
             'emptying_step': None,
             'num_events': 7,
             'num_primaries': 9100,
             'num_step_iters': 85088,
             'num_steps': 348520448,
             'pre_emptying_time': 0.006755898,
             'queue_hwm': {'count': 384933, 'index': 720},
             'setup_time': 29.125001943,
             'slot_occupancy': 1.0,
             'total_time': 570.778468893,
             'unconverged': 61934}],
 'system': {'debug': False,
            'geant4': '11.0.4',
            'occupancy': {},
            'vecgeom': '1.2.2',
            'version': '0.3.0-dev.160+cfc407b0'}}
Elapsed time for cms2018+field+msc-vecgeom-cpu: 600.8 (total: 1105)
Wrote summaries to /bld4/home/simbarras/project/regression/results/zeusstyle
Completed at Tue Jun 20 15:21:33 PDT 2023
==WARNING== No kernels were profiled.
simbarras commented 1 year ago

run-zeus.sh

#!/bin/bash -e

# module load geant4-data/11.0.0-s4eo python

echo "Running on $HOSTNAME at $(date)"
python3 run-problems.py zeusstyle
echo "Completed at $(date)"
exit 0
simbarras commented 1 year ago

run-problems.py

#!/usr/bin/env python3
# Copyright 2022 UT-Battelle, LLC, and other Celeritas developers.
# See the top-level COPYRIGHT file for details.
# SPDX-License-Identifier: (Apache-2.0 OR MIT)
"""
- Loop over all problems
- Launch simultaneously on multiple cores (different seed per run!)
- Save overall times from all runs, and output from one run
- Catch failure message and save

Requires Python 3.7+.
"""

import asyncio
import itertools
import json
from pathlib import Path, PurePath
from pprint import pprint
from os import environ
import shutil
from signal import SIGINT, SIGTERM, SIGKILL
import subprocess
import sys
import time

from summarize import inp_to_nametuple, summarize_all, exception_to_dict, get_num_events_and_primaries

g4env = {k: v for k, v in environ.items()
         if k.startswith('G4')}

systems = {}

class System:
    name = None
    build_dirs = {}
    num_jobs = None # Number of simultaneous jobs to run
    gpu_per_job = None
    cpu_per_job = None

    def create_celer_subprocess(self, inp):
        try:
            build = self.build_dirs[inp["_geometry"]]
        except KeyError:
            build = PurePath("nonexistent")
        cmd = build / "bin" / "celer-sim"
        env = dict(environ)
        if not inp['use_device']:
            env['OMP_NUM_THREADS'] = str(self.cpu_per_job)
            env['CELER_DISABLE_DEVICE'] = "1"
        else:
            env['OMP_NUM_THREADS'] = "1"
            env['CUDA_VISIBLE_DEVICES'] = str(inp['_instance'])

        return asyncio.create_subprocess_exec(
            cmd, "-",
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
            env=env
        )

    def get_monitoring_coro(self):
        return []

class Wildstyle(System):
    build_dirs = {
        'orange': Path("/home/s3j/.local/src/celeritas/build-reldeb"),
        'vecgeom': Path("/home/s3j/.local/src/celeritas/build-reldeb-vecgeom"),
    }
    name = "wildstyle"
    num_jobs = 2
    gpu_per_job = 1
    cpu_per_job = 32

class Zeusstyle(System):
    scriptPath = Path(__file__).parent
    build_dirs = {
    'vecgeom': Path(f"{scriptPath}/../celeritas/build-reldeb"),
    }
    name = "zeusstyle"
    num_jobs = 1
    gpu_per_job = 1
    cpu_per_job = 1

class Local(System):
    build_dirs = {
        "orange": Path("/Users/seth/.local/src/celeritas/build"),
    }
    name = "testing"
    num_jobs = 1
    gpu_per_job = 0
    cpu_per_job = 1

class Summit(System):
    _CELER_ROOT = Path(environ.get('PROJWORK', '')) / 'csc404' / 'celeritas'
    build_dirs = {
        "orange": _CELER_ROOT / 'build-ndebug-novg',
        "vecgeom": _CELER_ROOT / 'build-ndebug',
    }
    name = "summit"
    num_jobs = 6
    gpu_per_job = 1
    cpu_per_job = 7

    def create_celer_subprocess(self, inp):
        cmd = "jsrun"
        env = g4env.copy()
        env["OMP_NUM_THREADS"] = str(self.cpu_per_job)

        args = [
            "-n1", # total resource sets
            "-r1", # resource sets per host
            "-a1", # tasks per resource set
            f"-c{self.cpu_per_job}", # CPUs per resource set
            "--bind=packed:7",
            "--launch_distribution=packed",
        ]
        if inp['use_device']:
            args.append("-g1") # GPUs per resource set
        else:
            env["CELER_DISABLE_DEVICE"] = "1"
            args.append("-g0")

        args.extend("".join(["-E", k, "=", v]) for k, v in env.items())

        try:
            build = self.build_dirs[inp["_geometry"]]
        except KeyError:
            build = PurePath("nonexistent")

        args.extend([
            build / "bin" / "celer-sim",
            "-"
        ])

        return asyncio.create_subprocess_exec(
            cmd, *args,
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
        )

    async def run_jslist(self):
        # Wait a second for the jobs to start
        await asyncio.sleep(1)
        print("Running jslist")

        try:
            proc = await asyncio.create_subprocess_exec("jslist", "-r", "-R")
        except FileNotFoundError as e:
            print("jslist not found :(")
            return

        print("Waiting on jslist output")
        await proc.communicate()

    def get_monitoring_coro(self):
        return [self.run_jslist()]

class Crusher(System):
    _CELER_ROOT = Path(environ['HOME']) / '.local' / 'src' / 'celeritas-crusher'
    build_dirs = {
        "orange": _CELER_ROOT / 'build-ndebug'
    }
    name = "crusher"
    # NOTE: layout multi-gpu run
    # num_jobs = 4
    # gpu_per_job = 2
    # cpu_per_job = 16
    num_jobs = 8
    gpu_per_job = 1
    cpu_per_job = 8

    def create_celer_subprocess(self, inp):
        cmd = "srun"
        env = dict(environ)
        env["OMP_NUM_THREADS"] = str(self.cpu_per_job)

        args = [
            f"--cpus-per-task={self.cpu_per_job}",
        ]
        if inp['use_device']:
            args.append("--gpus-per-task=1")
        else:
            env["CELER_DISABLE_DEVICE"] = "1"
            args.append("--gpus=0")

        try:
            build = self.build_dirs[inp["_geometry"]]
        except KeyError:
            build = PurePath("nonexistent")

        args.extend([
            build / "bin" / "celer-sim",
            "-"
        ])

        return asyncio.create_subprocess_exec(
            cmd, *args,
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
            env=env,
        )

regression_dir = Path(__file__).parent
input_dir = regression_dir / "input"

base_input = {
    "_timeout": 600.0,
    "brem_combined": False,
    "initializer_capacity": 2**20,
    "max_num_tracks": 2**12,
    "max_steps": 2**21,
    "secondary_stack_factor": 3.0,
    "enable_diagnostics": False,
    "use_device": False,
    "sync": True,
    "eloss_fluctuation": True,
}

if True:
    # v0.2 and higher
    base_input["geant_options"] = {
        "coulomb_scattering": False,
        "rayleigh_scattering": True,
        "eloss_fluctuation": False,
        "lpm": True,
        "em_bins_per_decade": 56,
        "physics": "em_basic",
        "msc": "none",
    }
    base_input["merge_events"] = True # v0.3
    use_msc = {"geant_options": {"msc": "urban"}}
    use_field = {
        "mag_field": [0.0, 0.0, 1.0],
        "eloss_fluctuation": False,
    }
else:
    # v0.1
    base_input.update({
        "brem_lpm": True,
        "conv_lpm": True,
        "eloss_fluctuation": False,
        "enable_msc": False,
        "rayleigh": True,
    })
    use_msc = {"enable_msc": True}
    use_field = {
        "mag_field": [0.0, 0.0, 1000.0],
        "eloss_fluctuation": False,
    }

use_gpu = {
    "use_device": True,
    "max_num_tracks": 2**20,
    "max_steps": 2**15,
    "initializer_capacity": 2**26,
}

testem15 = {
    "_geometry": "orange",
    "_num_events": 7,
    "_num_primaries": 9100,
    "geometry_filename": "testem15.org.json",
    "hepmc3_filename": "testem15-13TeV.hepmc3",
    "physics_filename": "testem15.gdml",
    "sync": False,
}

simple_cms = {
    "_geometry": "orange",
    "_num_events": 7,
    "_num_primaries": 9100,
    "geometry_filename": "simple-cms.org.json",
    "hepmc3_filename": "simple-cms-13TeV.hepmc3",
    "physics_filename": "simple-cms.gdml",
}

testem3 = {
    "_geometry": "orange",
    "geometry_filename": "testem3-flat.org.json",
    "physics_filename": "testem3-flat.gdml",
    "sync": False,
    "primary_gen_options": {
        "pdg": 11,
        "energy": 10000,  # 10 GeV
        "position": [-22, 0, 0],
        "direction": [1, 0, 0],
        "num_events": 7,
        "primaries_per_event": 1300  # 13 TeV
    }
}

full_cms = {
    "_geometry": "vecgeom",
    "_num_events": 7,
    "_num_primaries": 9100,
    "geometry_filename": "cms2018.gdml",
    "hepmc3_filename": "simple-cms-13TeV.hepmc3",
    "physics_filename": "cms2018.gdml",
    "cuda_stack_size": 8192, # Needed for v0.3+ when vecgeom is overridden
}

def use_vecgeom(basename):
    return {"_geometry": "vecgeom", "geometry_filename": basename + ".gdml"}

# List of list of setting dictionaries
problems = [
#    [testem15],
#    [testem15, use_field],
#    [testem15, use_msc, use_field],
#    [testem15, use_msc, use_field, use_vecgeom("testem15")],
#    [simple_cms, use_msc],
#    [simple_cms, use_field],
#    [simple_cms, use_field, use_msc],
#    [simple_cms, use_field, use_msc, use_vecgeom("simple-cms")],
#    [testem3],
#    [testem3, use_vecgeom("testem3-flat")],
#    [testem3, use_field],
#    [testem3, use_msc],
#    [testem3, use_field, use_msc],
#    [testem3, use_field, use_msc, use_vecgeom("testem3-flat")],
    [full_cms],
    [full_cms, use_field, use_msc],
]

def recurse_updated(d, other):
    result = d.copy()
    result.update(other)
    for k, v in result.items():
        if isinstance(v, dict):
            try:
                orig = d[k]
            except KeyError:
                v = result[k]
            else:
                v = recurse_updated(orig, result[k])
            result[k] = v
    return result

def build_input(problem_dicts):
    """Construct an input dictionary by merging inputs.

    Later entries override earlier entries.
    """
    inp = base_input.copy()
    for d in problem_dicts:
        inp = recurse_updated(inp, d)
    for k in inp:
        if k.endswith('_filename'):
            inp[k] = str(input_dir / inp[k])

    inp["_name"] = name = inp_to_nametuple(inp)
    inp["_outdir"] = "-".join(name)
    (inp["max_events"], _) = get_num_events_and_primaries(inp)
    return inp

def build_instance(inp, instance):
    inp = inp.copy()
    inp["_instance"] = instance
    inp["seed"] = 20220904 + instance
    return inp

async def communicate_with_timeout(proc, interrupt, terminate=5.0, kill=1.0, input=None):
    """Interrupt, then terminate, then kill a process if it doesn't
    communicate.
    """
    try:
        result = await asyncio.wait_for(
            proc.communicate(input),
            timeout=interrupt)
    except asyncio.TimeoutError:
        print(f"Timed out after {interrupt} seconds: sending interrupt")
        proc.send_signal(SIGINT)
    else:
        return result

    try:
        result = await asyncio.wait_for(proc.communicate(),
                    timeout=terminate)
    except asyncio.TimeoutError:
        print(f"Timed out *AGAIN* after {terminate} seconds")
        proc.send_signal(SIGTERM)
    else:
        return result

    try:
        result = await asyncio.wait_for(proc.communicate(),
                    timeout=kill)
    except asyncio.TimeoutError:
        print(f"Set phasers to kill after {kill} seconds")
        proc.send_signal(SIGKILL)
    else:
        return result

    print("Awaiting communication")
    result = await proc.communicate()
    return result

async def run_celeritas(system, results_dir, inp):
    instance = inp['_instance']
    try:
        proc = await system.create_celer_subprocess(inp)
    except FileNotFoundError as e:
        print("File not found:", e)
        return exception_to_dict(e, context="creating subprocess")

    # TODO: monitor output, e.g. https://gist.github.com/kalebo/1e085ee36de45ffded7e5d9f857265d0

    print(f"{instance}: awaiting communcation")
    failed = False
    out, err = await communicate_with_timeout(proc,
        input=json.dumps(inp).encode(),
        interrupt=inp['_timeout']
    )

    try:
        result = json.loads(out)
    except json.decoder.JSONDecodeError as e:
        print(f"{instance}: failed to decode JSON")
        failed = True
        result = {
            'stdout': out.decode().splitlines(),
        }

    if proc.returncode:
        print(f"{instance}: exit code {proc.returncode}")
        failed = True
        result['stderr'] = err.decode().splitlines()

    # Copy special inputs to output for later processing
    result.setdefault('input', {}).update(
        {k: v for k,v in inp.items() if k.startswith('_')}
    )

    try:
        outdir = results_dir / inp['_outdir']
        outdir.mkdir(exist_ok=True)
        with open(outdir / f"{instance:d}.json", "w") as f:
            json.dump(result, f, indent=0, sort_keys=True)
    except Exception as e:
        print(f"{instance}: failed to output:", repr(e))
        failed = True

    if proc.returncode:
        # Write input to reproduce later
        with open(outdir / f"{instance:d}.inp.json", "w") as f:
            json.dump(inp, f, indent=0, sort_keys=True)

    if not failed:
        print(f"{instance}: success")

    return result

async def main():
    try:
        sysname = sys.argv[1]
    except IndexError:
        Sys = Local
    else:
        # TODO: use metaclass to build this list automatically
        _systems = {S.name: S for S in [Summit, Crusher, Wildstyle, Zeusstyle]}
        Sys = _systems[sysname]
    system = Sys()

    # Copy build files
    buildfile_dir = regression_dir / 'build-files' / system.name
    buildfile_dir.mkdir(exist_ok=True)
    for k, v in system.build_dirs.items():
        shutil.copyfile(v / 'CMakeCache.txt', buildfile_dir / (k + '.txt'))

    results_dir = regression_dir / 'results' / system.name
    results_dir.mkdir(exist_ok=True)

    device_mods = []
#    if system.gpu_per_job:
#        device_mods.append([use_gpu])
    device_mods.append([]) # CPU

    inputs = [build_input([base_input] + p + d)
              for p, d in itertools.product(problems, device_mods)]
    with open(results_dir / "index.json", "w") as f:
        json.dump([(inp['_outdir'], inp['_name'])
                   for inp in inputs], f, indent=0)

    summaries = {}
    allstart = time.monotonic()
    _num_inputs = len(inputs)
    for (i, inp) in enumerate(inputs, start=1):
        print("="*79)
        name = inp['_outdir']
        print(f"Running problem {i} of {_num_inputs}: {name}...")
        start = time.monotonic()
        tasks = [run_celeritas(system, results_dir, build_instance(inp, i))
                 for i in range(system.num_jobs)]
        if not summaries:
            # Only print monitoring for first instance
            tasks.extend(system.get_monitoring_coro())
        result = await asyncio.gather(*tasks)

        # Ignore results from monitoring tasks
        result = result[:system.num_jobs]

        try:
            summaries[name] = summary = summarize_all(result)
        except Exception as e:
            print("*"*79)
            print("FAILED input:")
            pprint(inp)
            print("*"*79)
            pprint(result)
            print("Failed to summarize result above")
            raise
        summary['name'] = inp['_name'] # name tuple
        pprint(summary)
        alldelta = time.monotonic() - allstart
        delta = time.monotonic() - start
        print(f"Elapsed time for {name}: {delta:.1f} (total: {alldelta:.0f})")

    with open(results_dir / 'summaries.json', 'w') as f:
        json.dump(summaries, f, indent=1, sort_keys=True)
    print(f"Wrote summaries to {results_dir}")

asyncio.run(main())
simbarras commented 1 year ago

This issue won't be more explored because the goal is to run the profile on Perlmutter. See https://github.com/simbarras/tb23-gpu-opt-celeritas/issues/28

simbarras commented 1 year ago

run-zeus.sh

#!/bin/bash -e

# module load geant4-data/11.0.0-s4eo python

# Load same configuration as used to build
source ../celeritas/scripts/env/zeus.sh

echo "Running on $HOSTNAME at $(date)"

# Check argument for ncu profiling
cmd="python3 run-problems.py zeussimple"
if [ $1 == 'profiling' ]
then
    echo "Profiling enabled"
    #cmd="ncu -f --export=profile --set=full --launch-skip=345 --launch-count=10 --kernel-name=along_step_uniform_msc_kernel --target-processes-filter=regex:*/celer-sim $cmd"
    prof="ncu"
    prof="$prof -f"
    prof="$prof --export=profile-$(date +%Y%m%d-2%H%M%S)"
    prof="$prof --set=full"
    prof="$prof --launch-skip=345"
    prof="$prof --launch-count=10"
    prof="$prof --kernel-name=along_step_uniform_msc_kernel"
    prof="$prof --target-processes=all"
    prof="$prof --target-processes-filter=regex:demo-loop"
    cmd="$prof $cmd"
fi

echo "Running command: $cmd"
$cmd

echo "Completed at $(date)"
exit 0
simbarras commented 1 year ago

run-problems.py

#!/usr/bin/env python3
# Copyright 2022 UT-Battelle, LLC, and other Celeritas developers.
# See the top-level COPYRIGHT file for details.
# SPDX-License-Identifier: (Apache-2.0 OR MIT)
"""
- Loop over all problems
- Launch simultaneously on multiple cores (different seed per run!)
- Save overall times from all runs, and output from one run
- Catch failure message and save

Requires Python 3.7+.
"""

import asyncio
import itertools
import json
from pathlib import Path, PurePath
from pprint import pprint
from os import environ
import shutil
from signal import SIGINT, SIGTERM, SIGKILL
import subprocess
import sys
import time

from summarize import inp_to_nametuple, summarize_all, exception_to_dict, get_num_events_and_primaries

g4env = {k: v for k, v in environ.items()
         if k.startswith('G4')}

systems = {}

class System:
    name = None
    build_dirs = {}
    num_jobs = None # Number of simultaneous jobs to run
    gpu_per_job = None
    cpu_per_job = None

    def create_celer_subprocess(self, inp):
        try:
            build = self.build_dirs[inp["_geometry"]]
        except KeyError:
            build = PurePath("nonexistent")
        cmd = build / "app/demo-loop"
        env = dict(environ)
        env['OMP_NUM_THREADS'] = str(self.cpu_per_job)
        if not inp['use_device']:
            env['CELER_DISABLE_DEVICE'] = "1"
        else:
            env['CUDA_VISIBLE_DEVICES'] = str(inp['_instance'])

        return asyncio.create_subprocess_exec(
            cmd, "-",
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
            env=env
        )

    def get_monitoring_coro(self):
        return []

class Wildstyle(System):
    build_dirs = {
        'orange': Path("/home/s3j/.local/src/celeritas/build-reldeb"),
        'vecgeom': Path("/home/s3j/.local/src/celeritas/build-reldeb-vecgeom"),
    }
    name = "wildstyle"
    num_jobs = 2
    gpu_per_job = 1
    cpu_per_job = 32

class Local(System):
    build_dirs = {
        "orange": Path("/Users/seth/.local/src/celeritas/build"),
    }
    name = "testing"
    num_jobs = 1
    gpu_per_job = 0
    cpu_per_job = 1

class Summit(System):
    _CELER_ROOT = Path(environ.get('PROJWORK', '')) / 'csc404' / 'celeritas'
    build_dirs = {
        "orange": _CELER_ROOT / 'build-ndebug-novg',
        "vecgeom": _CELER_ROOT / 'build-ndebug',
    }
    name = "summit"
    num_jobs = 6
    gpu_per_job = 1
    cpu_per_job = 7

    def create_celer_subprocess(self, inp):
        cmd = "jsrun"
        env = g4env.copy()
        env["OMP_NUM_THREADS"] = str(self.cpu_per_job)

        args = [
            "-n1", # total resource sets
            "-r1", # resource sets per host
            "-a1", # tasks per resource set
            f"-c{self.cpu_per_job}", # CPUs per resource set
            "--bind=packed:7",
            "--launch_distribution=packed",
        ]
        if inp['use_device']:
            args.append("-g1") # GPUs per resource set
        else:
            env["CELER_DISABLE_DEVICE"] = "1"
            args.append("-g0")

        args.extend("".join(["-E", k, "=", v]) for k, v in env.items())

        try:
            build = self.build_dirs[inp["_geometry"]]
        except KeyError:
            build = PurePath("nonexistent")

        args.extend([
            build / "app" / "demo-loop",
            "-"
        ])

        return asyncio.create_subprocess_exec(
            cmd, *args,
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
        )

    async def run_jslist(self):
        # Wait a second for the jobs to start
        await asyncio.sleep(1)
        print("Running jslist")

        try:
            proc = await asyncio.create_subprocess_exec("jslist", "-r", "-R")
        except FileNotFoundError as e:
            print("jslist not found :(")
            return

        print("Waiting on jslist output")
        await proc.communicate()

    def get_monitoring_coro(self):
        return [self.run_jslist()]

class Crusher(System):
    _CELER_ROOT = Path(environ['HOME']) / '.local' / 'src' / 'celeritas-crusher'
    build_dirs = {
        "orange": _CELER_ROOT / 'build-ndebug'
    }
    name = "crusher"
    # NOTE: layout multi-gpu run
    # num_jobs = 4
    # gpu_per_job = 2
    # cpu_per_job = 16
    num_jobs = 8
    gpu_per_job = 1
    cpu_per_job = 8

    def create_celer_subprocess(self, inp):
        cmd = "srun"
        env = dict(environ)
        env["OMP_NUM_THREADS"] = str(self.cpu_per_job)

        args = [
            f"--cpus-per-task={self.cpu_per_job}",
        ]
        if inp['use_device']:
            args.append("--gpus-per-task=1")
        else:
            env["CELER_DISABLE_DEVICE"] = "1"
            args.append("--gpus=0")

        try:
            build = self.build_dirs[inp["_geometry"]]
        except KeyError:
            build = PurePath("nonexistent")

        ncu_set = "full"
        kernel_name = "along_step_uniform_msc_kernel"
        problem = Path(inp["geometry_filename"]).name.split('.')[0]
        filename = f"{ncu_set}-{problem}-{inp['track_order']}-{kernel_name}"

        args.extend([
            "ncu",
            "-f",
            f"--export={filename}",
            f"--set={ncu_set}",
            "--launch-skip=345",
            "--launch-count=10",
            f"--kernel-name={kernel_name}",
            build / "app" / "demo-loop",
            "-"
        ])

        return asyncio.create_subprocess_exec(
            cmd, *args,
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
            env=env,
        )

# NOTE: Perlmutter uses Slurm so we can just inherit from Crusher
class Perlmutter(Crusher):
    #_CELER_ROOT = Path(environ['CFS']) / 'atlas' / 'esseivaj' / 'devel' / 'celeritas'
    _CELER_ROOT = Path(environ['HOME']) / 'project' / 'celeritas'
    build_dirs = {
        "vecgeom": _CELER_ROOT / 'build-ndebug',
        "orange": _CELER_ROOT / 'build-ndebug-novg'
    }
    name = "perlmutter"
    num_jobs = 1
    gpu_per_job = 1
    cpu_per_job = 1

class ZeusSimple(System):
    _CELER_ROOT = Path(environ['HOME']) / 'project' / 'celeritas'
    build_dirs = {
        "vecgeom": _CELER_ROOT / 'build-ndebug',
    }
    name = "zeussimple"
    num_jobs = 1
    gpu_per_job = 1
    cpu_per_job = 1

    def create_celer_subprocess(self, inp):
        try:
            build = self.build_dirs[inp["_geometry"]]
        except KeyError:
            build = PurePath("nonexistent")
        cmd = build / "app/demo-loop"
        env = dict(environ)
        env['OMP_NUM_THREADS'] = str(self.cpu_per_job)
        if not inp['use_device']:
            env['CELER_DISABLE_DEVICE'] = "1"
        else:
            env['CUDA_VISIBLE_DEVICES'] = str(inp['_instance'])

        return asyncio.create_subprocess_exec(
            cmd, "-",
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
            env=env
        )

class ZeusProfile(ZeusSimple):
    name = "zeusprofile"

    def create_celer_subprocess(self, inp):
        try:
            build = self.build_dirs[inp["_geometry"]]
        except KeyError:
            build = PurePath("nonexistent")
        cmd = build / "app/demo-loop"
        env = dict(environ)
        env['OMP_NUM_THREADS'] = str(self.cpu_per_job)
        if not inp['use_device']:
            env['CELER_DISABLE_DEVICE'] = "1"
        else:
            env['CUDA_VISIBLE_DEVICES'] = str(inp['_instance'])

        args = [
            "-f",
            "--export=profile",
            "--set=full",
            "--launch-skip=345",
            "--launch-count=10",
            "--kernel-name=along_step_uniform_msc_kernel",
            "--replay-mode=application",
            #"--target-processes=all",
            build / "app" / "demo-loop",
            "-",
        ]

        return asyncio.create_subprocess_exec(
            "ncu", *args,
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE,
            env=env
        )

regression_dir = Path(__file__).parent
input_dir = regression_dir / "input"

base_input = {
    "_timeout": 4 * 3600.0,
    "brem_combined": False,
    "initializer_capacity": 2**20,
    "mag_field": [0.0, 0.0, 1.0],
    "max_num_tracks": 2**12,
    "max_steps": 2**21,
    "track_order": "shuffled",
    "secondary_stack_factor": 3.0,
    "enable_diagnostics": False,
    "use_device": False,
    "sync": True,
    # Geant options
    "geant_options": {
        "coulomb_scattering": False,
        "rayleigh_scattering": True,
        "eloss_fluctuation": False,
        "lpm": True,
        "em_bins_per_decade": 56,
        "physics": "em_basic",
        "msc": "none",
    },
}

use_msc = {"geant_options": {"msc": "urban"}}

use_gpu = {
    "use_device": True,
    "max_num_tracks": 2**20,
    "max_steps": 2**15,
    "initializer_capacity": 2**26,
}

no_field = {
    "mag_field": [0.0, 0.0, 0.0],
    "eloss_fluctuation": True,
}

testem15 = {
    "_geometry": "orange",
    "_num_events": 7,
    "_num_primaries": 9100,
    "geometry_filename": "testem15.org.json",
    "hepmc3_filename": "testem15-13TeV.hepmc3",
    "physics_filename": "testem15.gdml",
    "mag_field": [0.0, 0.0, 1.0],
    "sync": False,
}

simple_cms = {
    "_geometry": "orange",
    "_num_events": 7,
    "_num_primaries": 9100,
    "geometry_filename": "simple-cms.org.json",
    "hepmc3_filename": "simple-cms-13TeV.hepmc3",
    "physics_filename": "simple-cms.gdml",
    "mag_field": [0.0, 0.0, 1.0],
}

testem3 = {
    "_geometry": "orange",
    "geometry_filename": "testem3-flat.org.json",
    "physics_filename": "testem3-flat.gdml",
    "mag_field": [0.0, 0.0, 1.0],
    "sync": False,
    "primary_gen_options": {
        "pdg": 11,
        "energy": 10000,  # 10 GeV
        "position": [-22, 0, 0],
        "direction": [1, 0, 0],
        "num_events": 7,
        "primaries_per_event": 1300  # 13 TeV
    }
}

full_cms = {
    "_geometry": "vecgeom",
    "_num_events": 7,
    "_num_primaries": 9100,
    "geometry_filename": "cms2018.gdml",
    "hepmc3_filename": "simple-cms-13TeV.hepmc3",
    "physics_filename": "cms2018.gdml",
    "mag_field": [0.0, 0.0, 1.0],
}

# List of list of setting dictionaries
problems = [
    # [testem15, no_field],
    # [testem15],
    # [testem15, use_msc,
    #     {"_geometry": "vecgeom", "geometry_filename": "testem15.gdml"}],
    # [testem15, use_msc],
    # [simple_cms, no_field, use_msc],
    # [simple_cms],
    # [simple_cms, use_msc],
    # [simple_cms, use_msc,
    #     {"_geometry": "vecgeom", "geometry_filename": "simple-cms.gdml"}],
    # [testem3, no_field],
    # [testem3, no_field,
    #     {"_geometry": "vecgeom", "geometry_filename": "testem3-flat.gdml"}],
    # [testem3],
    # [testem3, no_field, use_msc],
    # [testem3, use_msc,
    #     {"_geometry": "vecgeom", "geometry_filename": "testem3-flat.gdml"}],
    # [full_cms, no_field],
    [full_cms, use_msc],
]

def recurse_updated(d, other):
    result = d.copy()
    result.update(other)
    for k, v in result.items():
        if isinstance(v, dict):
            try:
                orig = d[k]
            except KeyError:
                v = result[k]
            else:
                v = recurse_updated(orig, result[k])
            result[k] = v
    return result

def build_input(problem_dicts):
    """Construct an input dictionary by merging inputs.

    Later entries override earlier entries.
    """
    inp = base_input.copy()
    for d in problem_dicts:
        inp = recurse_updated(inp, d)
    for k in inp:
        if k.endswith('_filename'):
            inp[k] = str(input_dir / inp[k])

    inp["_name"] = name = inp_to_nametuple(inp)
    inp["_outdir"] = "-".join(name)
    (inp["max_events"], _) = get_num_events_and_primaries(inp)
    return inp

def build_instance(inp, instance):
    inp = inp.copy()
    inp["_instance"] = instance
    inp["seed"] = 20220904 + instance
    return inp

async def communicate_with_timeout(proc, interrupt, terminate=5.0, kill=1.0, input=None):
    """Interrupt, then terminate, then kill a process if it doesn't
    communicate.
    """
    try:
        result = await asyncio.wait_for(
            proc.communicate(input),
            timeout=interrupt)
    except asyncio.TimeoutError:
        print(f"Timed out after {interrupt} seconds: sending interrupt")
        proc.send_signal(SIGINT)
    else:
        return result

    try:
        result = await asyncio.wait_for(proc.communicate(),
                    timeout=terminate)
    except asyncio.TimeoutError:
        print(f"Timed out *AGAIN* after {terminate} seconds")
        proc.send_signal(SIGTERM)
    else:
        return result

    try:
        result = await asyncio.wait_for(proc.communicate(),
                    timeout=kill)
    except asyncio.TimeoutError:
        print(f"Set phasers to kill after {kill} seconds")
        proc.send_signal(SIGKILL)
    else:
        return result

    print("Awaiting communication")
    result = await proc.communicate()
    return result

async def run_celeritas(system, results_dir, inp):
    instance = inp['_instance']
    try:
        proc = await system.create_celer_subprocess(inp)
    except FileNotFoundError as e:
        print("File not found:", e)
        return exception_to_dict(e, context="creating subprocess")
    with open(results_dir / "0.inp.json", 'w') as f:
        json.dump(inp, f)
    # TODO: monitor output, e.g. https://gist.github.com/kalebo/1e085ee36de45ffded7e5d9f857265d0

    print(f"{instance}: awaiting communcation")
    failed = False
    out, err = await communicate_with_timeout(proc,
        input=json.dumps(inp).encode(),
        interrupt=inp['_timeout']
    )

    try:
        result = json.loads(out)
    except json.decoder.JSONDecodeError as e:
        print(f"{instance}: failed to decode JSON")
        failed = True
        result = {
            'stdout': out.decode().splitlines(),
        }

    if proc.returncode:
        print(f"{instance}: exit code {proc.returncode}")
        failed = True
        result['stderr'] = err.decode().splitlines()

    # Copy special inputs to output for later processing
    result.setdefault('input', {}).update(
        {k: v for k,v in inp.items() if k.startswith('_')}
    )

    try:
        outdir = results_dir / inp['_outdir']
        outdir.mkdir(exist_ok=True)
        with open(outdir / f"{instance:d}.json", "w") as f:
            json.dump(result, f, indent=0, sort_keys=True)
    except Exception as e:
        print(f"{instance}: failed to output:", repr(e))
        failed = True

    if proc.returncode:
        # Write input to reproduce later
        with open(outdir / f"{instance:d}.inp.json", "w") as f:
            json.dump(inp, f, indent=0, sort_keys=True)

    if not failed:
        print(f"{instance}: success")

    return result

async def main():
    try:
        sysname = sys.argv[1]
    except IndexError:
        Sys = Local
    else:
        # TODO: use metaclass to build this list automatically
        _systems = {S.name: S for S in [Summit, Crusher, Wildstyle, Perlmutter, ZeusProfile, ZeusSimple]}
        Sys = _systems[sysname]
    system = Sys()

    try:
        shuffled = sys.argv[2]
    except IndexError:
        shuffled = "unsorted"
    base_input["track_order"] = shuffled
    # Copy build files
    buildfile_dir = regression_dir / 'build-files' / system.name
    buildfile_dir.mkdir(exist_ok=True)
    for k, v in system.build_dirs.items():
        if v.exists():
            shutil.copyfile(v / 'CMakeCache.txt', buildfile_dir / (k + '.txt'))

    results_dir = regression_dir / 'results' / system.name
    results_dir.mkdir(exist_ok=True)

    device_mods = []
    if system.gpu_per_job:
        device_mods.append([use_gpu])
    # device_mods.append([]) # CPU

    inputs = [build_input([base_input] + p + d)
              for p, d in itertools.product(problems, device_mods)]
    with open(results_dir / "index.json", "w") as f:
        json.dump([(inp['_outdir'], inp['_name'])
                   for inp in inputs], f, indent=0)

    summaries = {}
    allstart = time.monotonic()
    for inp in inputs:
        print("="*79)
        #pprint(inp)
        start = time.monotonic()
        tasks = [run_celeritas(system, results_dir, build_instance(inp, i))
                 for i in range(system.num_jobs)]
        if not summaries:
            # Only print monitoring for first instance
            tasks.extend(system.get_monitoring_coro())
        result = await asyncio.gather(*tasks)

        # Ignore results from monitoring tasks
        result = result[:system.num_jobs]

        name = inp['_outdir']
        try:
            summaries[name] = summary = summarize_all(result)
        except Exception as e:
            print("*"*79)
            print("FAILED input:")
            pprint(inp)
            print("*"*79)
            pprint(result)
            print("Failed to summarize result above")
            raise
        summary['name'] = inp['_name'] # name tuple
        pprint(summary)
        alldelta = time.monotonic() - allstart
        delta = time.monotonic() - start
        print(f"Elapsed time for {name}: {delta:.1f} (total: {alldelta:.0f})")

    with open(results_dir / 'summaries.json', 'w') as f:
        json.dump(summaries, f, indent=1, sort_keys=True)
    print(f"Wrote summaries to {results_dir}")

asyncio.run(main())
simbarras commented 1 year ago

Run a profile

To run a profile you have to be on the tag 0.2.2 and cherry pick the commit 196f72d88472e71b3816adb4a3ea90512256ebd2 , use the script run-zeus.sh. This is an output example:

[simbarras@zeus regression]$ ./run-zeus.sh profiling
loading cuda/11.8.0
Running on zeus.lbl.gov at Wed Jun 28 08:38:35 PDT 2023
Profiling enabled
Running command: ncu -f --export=profile-20230628083835 --set=full --launch-skip=345 --launch-count=10 --kernel-name=along_step_uniform_msc_kernel --target-processes=all --target-processes-filter=regex:demo-loop python3 run-problems.py zeussimple
===============================================================================
0: awaiting communcation
==PROF== Connected to process 13078 (/bld4/home/simbarras/project/celeritas/build-ndebug/app/demo-loop)
==PROF== Profiling "along_step_uniform_msc_kernel": 0%
==WARNING== Backing up device memory in system memory. Kernel replay might be slow. Consider using "--replay-mode application" to avoid memory save-and-restore.

==WARNING== Backing up device memory in system memory. Kernel replay might be slow. Consider using "--replay-mode application" to avoid memory save-and-restore.
....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Disconnected from process 13078
0: success
{'input': {'enable_msc': True,
           'geometry_filename': 'cms2018.gdml',
           'mag_field': [0.0, 0.0, 1.0],
           'max_num_tracks': 1048576,
           'use_device': True},
 'name': ('cms2018+field+msc', 'vecgeom', 'gpu'),
 'result': [{'action_times': {'along-step-uniform-msc': 1537.9388543089851,
                              'annihil-2-gamma': 2.4152413649999884,
                              'brems-rel': 2.4863914809999947,
                              'brems-sb': 3.6194525759999987,
                              'conv-bethe-heitler': 2.7413463669999985,
                              'geo-boundary': 8.195453996000051,
                              'ioni-moller-bhabha': 2.5746462819999762,
                              'msc-urban': 0.1052031689999999,
                              'photoel-livermore': 3.630558379000009,
                              'physics-discrete-select': 3.272606033999989,
                              'pre-step': 17.438818170000022,
                              'scat-klein-nishina': 2.9887148080000063,
                              'scat-rayleigh': 2.5811740850000047},
             'active_hwm': {'count': 1048576, 'index': 874},
             'avg_steps_per_primary': 70582.95901098901,
             'avg_time_per_primary': 0.17634734310439562,
             'avg_time_per_step': 2.4984407791254573e-06,
             'emptying_step': 875,
             'num_events': 7,
             'num_primaries': 9100,
             'num_step_iters': 32768,
             'num_steps': 642304927,
             'pre_emptying_time': 0.233108244,
             'queue_hwm': {'count': 10039285, 'index': 458},
             'setup_time': 32.041358836,
             'slot_occupancy': 0.018693533696932718,
             'total_time': 1604.76082225,
             'unconverged': 2}],
 'system': {'debug': False,
            'geant4': '11.1.1',
            'occupancy': {'along_step_uniform_msc': 0.16666666666666666,
                          'bethe_heitler_interact': 0.5,
                          'boundary': 0.16666666666666666,
                          'discrete_select': 0.6666666666666666,
                          'eplusgg_interact': 0.6666666666666666,
                          'init_tracks': 0.16666666666666666,
                          'klein_nishina_interact': 0.6666666666666666,
                          'livermore_pe_interact': 0.5,
                          'locate_alive': 1.0,
                          'moller_bhabha_interact': 0.6666666666666666,
                          'pre_step': 0.6666666666666666,
                          'process_primaries': 1.0,
                          'process_secondaries': 1.0,
                          'rayleigh_interact': 0.5,
                          'relativistic_brem_interact': 0.5,
                          'seltzer_berger_interact': 0.5},
            'vecgeom': '1.2.2',
            'version': 'v0.2.2-1+9a018b4f'}}
Elapsed time for cms2018+field+msc-vecgeom-gpu: 1638.3 (total: 1638)
Wrote summaries to /bld4/home/simbarras/project/regression/results/zeussimple
==PROF== Report: /bld4/home/simbarras/project/regression/profile-20230628083835.ncu-rep

And this is an example profile