Closed simbarras closed 1 year ago
To launch launch code on zeus use an adaptation of the wildstyle.
It only works with the build without assertion (ndebug) and if the folder Celeritas
and regression
are in the same directory.
cd regression
./run-zeus.sh
If I run another version of code than "ndebug" I receive this error:
Running on zeus.lbl.gov at Tue Jun 20 09:26:02 PDT 2023
===============================================================================
Running problem 1 of 2: cms2018-vecgeom-cpu...
0: awaiting communcation
0: exit code 1
Couldn't summarize system: missing key 'system'
{'input': None,
'name': ('cms2018', 'vecgeom', 'cpu'),
'result': [{'exception': {'context': 'result',
'str': "'time'",
'type': "<class 'KeyError'>"},
'failure': {'condition': 'celeritas::device()',
'file': '/bld4/home/simbarras/project/celeritas/src/corecel/sys/Device.cc',
'line': 382,
'type': 'DebugError',
'which': 'precondition failed'}}],
'system': []}
Elapsed time for cms2018-vecgeom-cpu: 0.1 (total: 0)
===============================================================================
Running problem 2 of 2: cms2018+field+msc-vecgeom-cpu...
0: awaiting communcation
0: exit code 1
Couldn't summarize system: missing key 'system'
{'input': None,
'name': ('cms2018+field+msc', 'vecgeom', 'cpu'),
'result': [{'exception': {'context': 'result',
'str': "'time'",
'type': "<class 'KeyError'>"},
'failure': {'condition': 'celeritas::device()',
'file': '/bld4/home/simbarras/project/celeritas/src/corecel/sys/Device.cc',
'line': 382,
'type': 'DebugError',
'which': 'precondition failed'}}],
'system': []}
Elapsed time for cms2018+field+msc-vecgeom-cpu: 0.1 (total: 0)
Wrote summaries to /bld4/home/simbarras/project/regression/results/zeusstyle
Completed at Tue Jun 20 09:26:02 PDT 2023
To launch a profile run the following command of Nvidia: ncu -o profile --target-processes all "ExecFile"
.
Unfortunately, it says that no kernels have been profiled.
ncu -o profile --target-processes all ./run-zeus.sh
Running on zeus.lbl.gov at Tue Jun 20 15:03:07 PDT 2023
===============================================================================
Running problem 1 of 2: cms2018-vecgeom-cpu...
0: awaiting communcation
==PROF== Target process 293645 terminated before first instrumented API call.
==PROF== Connected to process 293647 (/bld4/home/simbarras/project/celeritas/build-ndebug/bin/celer-sim)
==PROF== Disconnected from process 293647
0: success
{'input': {'enable_msc': False,
'geometry_filename': 'cms2018.gdml',
'mag_field': None,
'num_track_slots': 4096,
'use_device': False},
'name': ('cms2018', 'vecgeom', 'cpu'),
'result': [{'action_times': {'along-step-general-linear': 60.618753608999235,
'along-step-neutral': 105.5287278220001,
'annihil-2-gamma': 1.4263713359999899,
'brems-rel': 0.9165693729999989,
'brems-sb': 15.855017272999936,
'conv-bethe-heitler': 3.0402688919999994,
'extend-from-primaries': 0.018025936999999017,
'extend-from-secondaries': 10.95150989100007,
'geo-boundary': 56.857236687999695,
'initialize-tracks': 10.413339748999977,
'ioni-moller-bhabha': 1.3485006619999989,
'photoel-livermore': 13.263060132000238,
'physics-discrete-select': 16.093574735000047,
'pre-step': 164.7882286409988,
'scat-klein-nishina': 9.471062748000076,
'scat-rayleigh': 2.4067804220000273},
'active_hwm': {'count': 4096, 'index': 106089},
'avg_steps_per_primary': 47733.382967032965,
'avg_time_per_primary': 0.05199944178186813,
'avg_time_per_step': 1.089372647603492e-06,
'emptying_step': 106090,
'num_events': 7,
'num_primaries': 9100,
'num_step_iters': 107005,
'num_steps': 434373785,
'pre_emptying_time': 0.004330396,
'queue_hwm': {'count': 405193, 'index': 701},
'setup_time': 30.349529215,
'slot_occupancy': 0.9910591781086456,
'total_time': 473.194920215,
'unconverged': 0}],
'system': {'debug': False,
'geant4': '11.0.4',
'occupancy': {},
'vecgeom': '1.2.2',
'version': '0.3.0-dev.160+cfc407b0'}}
Elapsed time for cms2018-vecgeom-cpu: 504.5 (total: 504)
===============================================================================
Running problem 2 of 2: cms2018+field+msc-vecgeom-cpu...
0: awaiting communcation
==PROF== Connected to process 294036 (/bld4/home/simbarras/project/celeritas/build-ndebug/bin/celer-sim)
Timed out after 600.0 seconds: sending interrupt
==PROF== Disconnected from process 294036
0: success
{'input': {'enable_msc': True,
'geometry_filename': 'cms2018.gdml',
'mag_field': [0.0, 0.0, 1.0],
'num_track_slots': 4096,
'use_device': False},
'name': ('cms2018+field+msc', 'vecgeom', 'cpu'),
'result': [{'action_times': {'along-step-neutral': 92.16368571399956,
'along-step-uniform-msc': 249.28573778599787,
'annihil-2-gamma': 0.9451890520000022,
'brems-rel': 0.6567738839999981,
'brems-sb': 8.868275839000074,
'conv-bethe-heitler': 1.7982603700000013,
'extend-from-primaries': 0.015070036000000354,
'extend-from-secondaries': 6.819236777999958,
'geo-boundary': 53.080884426000004,
'initialize-tracks': 6.519707357000065,
'ioni-moller-bhabha': 0.9658400109999912,
'photoel-livermore': 7.546174925000011,
'physics-discrete-select': 9.163355675000023,
'pre-step': 126.04717180299895,
'scat-klein-nishina': 5.272644965000028,
'scat-rayleigh': 1.4657519419999971},
'active_hwm': {'count': 4096, 'index': 85087},
'avg_steps_per_primary': 38298.95032967033,
'avg_time_per_primary': 0.06272290866956044,
'avg_time_per_step': 1.6377187397997375e-06,
'emptying_step': None,
'num_events': 7,
'num_primaries': 9100,
'num_step_iters': 85088,
'num_steps': 348520448,
'pre_emptying_time': 0.006755898,
'queue_hwm': {'count': 384933, 'index': 720},
'setup_time': 29.125001943,
'slot_occupancy': 1.0,
'total_time': 570.778468893,
'unconverged': 61934}],
'system': {'debug': False,
'geant4': '11.0.4',
'occupancy': {},
'vecgeom': '1.2.2',
'version': '0.3.0-dev.160+cfc407b0'}}
Elapsed time for cms2018+field+msc-vecgeom-cpu: 600.8 (total: 1105)
Wrote summaries to /bld4/home/simbarras/project/regression/results/zeusstyle
Completed at Tue Jun 20 15:21:33 PDT 2023
==WARNING== No kernels were profiled.
#!/bin/bash -e
# module load geant4-data/11.0.0-s4eo python
echo "Running on $HOSTNAME at $(date)"
python3 run-problems.py zeusstyle
echo "Completed at $(date)"
exit 0
#!/usr/bin/env python3
# Copyright 2022 UT-Battelle, LLC, and other Celeritas developers.
# See the top-level COPYRIGHT file for details.
# SPDX-License-Identifier: (Apache-2.0 OR MIT)
"""
- Loop over all problems
- Launch simultaneously on multiple cores (different seed per run!)
- Save overall times from all runs, and output from one run
- Catch failure message and save
Requires Python 3.7+.
"""
import asyncio
import itertools
import json
from pathlib import Path, PurePath
from pprint import pprint
from os import environ
import shutil
from signal import SIGINT, SIGTERM, SIGKILL
import subprocess
import sys
import time
from summarize import inp_to_nametuple, summarize_all, exception_to_dict, get_num_events_and_primaries
g4env = {k: v for k, v in environ.items()
if k.startswith('G4')}
systems = {}
class System:
name = None
build_dirs = {}
num_jobs = None # Number of simultaneous jobs to run
gpu_per_job = None
cpu_per_job = None
def create_celer_subprocess(self, inp):
try:
build = self.build_dirs[inp["_geometry"]]
except KeyError:
build = PurePath("nonexistent")
cmd = build / "bin" / "celer-sim"
env = dict(environ)
if not inp['use_device']:
env['OMP_NUM_THREADS'] = str(self.cpu_per_job)
env['CELER_DISABLE_DEVICE'] = "1"
else:
env['OMP_NUM_THREADS'] = "1"
env['CUDA_VISIBLE_DEVICES'] = str(inp['_instance'])
return asyncio.create_subprocess_exec(
cmd, "-",
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=env
)
def get_monitoring_coro(self):
return []
class Wildstyle(System):
build_dirs = {
'orange': Path("/home/s3j/.local/src/celeritas/build-reldeb"),
'vecgeom': Path("/home/s3j/.local/src/celeritas/build-reldeb-vecgeom"),
}
name = "wildstyle"
num_jobs = 2
gpu_per_job = 1
cpu_per_job = 32
class Zeusstyle(System):
scriptPath = Path(__file__).parent
build_dirs = {
'vecgeom': Path(f"{scriptPath}/../celeritas/build-reldeb"),
}
name = "zeusstyle"
num_jobs = 1
gpu_per_job = 1
cpu_per_job = 1
class Local(System):
build_dirs = {
"orange": Path("/Users/seth/.local/src/celeritas/build"),
}
name = "testing"
num_jobs = 1
gpu_per_job = 0
cpu_per_job = 1
class Summit(System):
_CELER_ROOT = Path(environ.get('PROJWORK', '')) / 'csc404' / 'celeritas'
build_dirs = {
"orange": _CELER_ROOT / 'build-ndebug-novg',
"vecgeom": _CELER_ROOT / 'build-ndebug',
}
name = "summit"
num_jobs = 6
gpu_per_job = 1
cpu_per_job = 7
def create_celer_subprocess(self, inp):
cmd = "jsrun"
env = g4env.copy()
env["OMP_NUM_THREADS"] = str(self.cpu_per_job)
args = [
"-n1", # total resource sets
"-r1", # resource sets per host
"-a1", # tasks per resource set
f"-c{self.cpu_per_job}", # CPUs per resource set
"--bind=packed:7",
"--launch_distribution=packed",
]
if inp['use_device']:
args.append("-g1") # GPUs per resource set
else:
env["CELER_DISABLE_DEVICE"] = "1"
args.append("-g0")
args.extend("".join(["-E", k, "=", v]) for k, v in env.items())
try:
build = self.build_dirs[inp["_geometry"]]
except KeyError:
build = PurePath("nonexistent")
args.extend([
build / "bin" / "celer-sim",
"-"
])
return asyncio.create_subprocess_exec(
cmd, *args,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
async def run_jslist(self):
# Wait a second for the jobs to start
await asyncio.sleep(1)
print("Running jslist")
try:
proc = await asyncio.create_subprocess_exec("jslist", "-r", "-R")
except FileNotFoundError as e:
print("jslist not found :(")
return
print("Waiting on jslist output")
await proc.communicate()
def get_monitoring_coro(self):
return [self.run_jslist()]
class Crusher(System):
_CELER_ROOT = Path(environ['HOME']) / '.local' / 'src' / 'celeritas-crusher'
build_dirs = {
"orange": _CELER_ROOT / 'build-ndebug'
}
name = "crusher"
# NOTE: layout multi-gpu run
# num_jobs = 4
# gpu_per_job = 2
# cpu_per_job = 16
num_jobs = 8
gpu_per_job = 1
cpu_per_job = 8
def create_celer_subprocess(self, inp):
cmd = "srun"
env = dict(environ)
env["OMP_NUM_THREADS"] = str(self.cpu_per_job)
args = [
f"--cpus-per-task={self.cpu_per_job}",
]
if inp['use_device']:
args.append("--gpus-per-task=1")
else:
env["CELER_DISABLE_DEVICE"] = "1"
args.append("--gpus=0")
try:
build = self.build_dirs[inp["_geometry"]]
except KeyError:
build = PurePath("nonexistent")
args.extend([
build / "bin" / "celer-sim",
"-"
])
return asyncio.create_subprocess_exec(
cmd, *args,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=env,
)
regression_dir = Path(__file__).parent
input_dir = regression_dir / "input"
base_input = {
"_timeout": 600.0,
"brem_combined": False,
"initializer_capacity": 2**20,
"max_num_tracks": 2**12,
"max_steps": 2**21,
"secondary_stack_factor": 3.0,
"enable_diagnostics": False,
"use_device": False,
"sync": True,
"eloss_fluctuation": True,
}
if True:
# v0.2 and higher
base_input["geant_options"] = {
"coulomb_scattering": False,
"rayleigh_scattering": True,
"eloss_fluctuation": False,
"lpm": True,
"em_bins_per_decade": 56,
"physics": "em_basic",
"msc": "none",
}
base_input["merge_events"] = True # v0.3
use_msc = {"geant_options": {"msc": "urban"}}
use_field = {
"mag_field": [0.0, 0.0, 1.0],
"eloss_fluctuation": False,
}
else:
# v0.1
base_input.update({
"brem_lpm": True,
"conv_lpm": True,
"eloss_fluctuation": False,
"enable_msc": False,
"rayleigh": True,
})
use_msc = {"enable_msc": True}
use_field = {
"mag_field": [0.0, 0.0, 1000.0],
"eloss_fluctuation": False,
}
use_gpu = {
"use_device": True,
"max_num_tracks": 2**20,
"max_steps": 2**15,
"initializer_capacity": 2**26,
}
testem15 = {
"_geometry": "orange",
"_num_events": 7,
"_num_primaries": 9100,
"geometry_filename": "testem15.org.json",
"hepmc3_filename": "testem15-13TeV.hepmc3",
"physics_filename": "testem15.gdml",
"sync": False,
}
simple_cms = {
"_geometry": "orange",
"_num_events": 7,
"_num_primaries": 9100,
"geometry_filename": "simple-cms.org.json",
"hepmc3_filename": "simple-cms-13TeV.hepmc3",
"physics_filename": "simple-cms.gdml",
}
testem3 = {
"_geometry": "orange",
"geometry_filename": "testem3-flat.org.json",
"physics_filename": "testem3-flat.gdml",
"sync": False,
"primary_gen_options": {
"pdg": 11,
"energy": 10000, # 10 GeV
"position": [-22, 0, 0],
"direction": [1, 0, 0],
"num_events": 7,
"primaries_per_event": 1300 # 13 TeV
}
}
full_cms = {
"_geometry": "vecgeom",
"_num_events": 7,
"_num_primaries": 9100,
"geometry_filename": "cms2018.gdml",
"hepmc3_filename": "simple-cms-13TeV.hepmc3",
"physics_filename": "cms2018.gdml",
"cuda_stack_size": 8192, # Needed for v0.3+ when vecgeom is overridden
}
def use_vecgeom(basename):
return {"_geometry": "vecgeom", "geometry_filename": basename + ".gdml"}
# List of list of setting dictionaries
problems = [
# [testem15],
# [testem15, use_field],
# [testem15, use_msc, use_field],
# [testem15, use_msc, use_field, use_vecgeom("testem15")],
# [simple_cms, use_msc],
# [simple_cms, use_field],
# [simple_cms, use_field, use_msc],
# [simple_cms, use_field, use_msc, use_vecgeom("simple-cms")],
# [testem3],
# [testem3, use_vecgeom("testem3-flat")],
# [testem3, use_field],
# [testem3, use_msc],
# [testem3, use_field, use_msc],
# [testem3, use_field, use_msc, use_vecgeom("testem3-flat")],
[full_cms],
[full_cms, use_field, use_msc],
]
def recurse_updated(d, other):
result = d.copy()
result.update(other)
for k, v in result.items():
if isinstance(v, dict):
try:
orig = d[k]
except KeyError:
v = result[k]
else:
v = recurse_updated(orig, result[k])
result[k] = v
return result
def build_input(problem_dicts):
"""Construct an input dictionary by merging inputs.
Later entries override earlier entries.
"""
inp = base_input.copy()
for d in problem_dicts:
inp = recurse_updated(inp, d)
for k in inp:
if k.endswith('_filename'):
inp[k] = str(input_dir / inp[k])
inp["_name"] = name = inp_to_nametuple(inp)
inp["_outdir"] = "-".join(name)
(inp["max_events"], _) = get_num_events_and_primaries(inp)
return inp
def build_instance(inp, instance):
inp = inp.copy()
inp["_instance"] = instance
inp["seed"] = 20220904 + instance
return inp
async def communicate_with_timeout(proc, interrupt, terminate=5.0, kill=1.0, input=None):
"""Interrupt, then terminate, then kill a process if it doesn't
communicate.
"""
try:
result = await asyncio.wait_for(
proc.communicate(input),
timeout=interrupt)
except asyncio.TimeoutError:
print(f"Timed out after {interrupt} seconds: sending interrupt")
proc.send_signal(SIGINT)
else:
return result
try:
result = await asyncio.wait_for(proc.communicate(),
timeout=terminate)
except asyncio.TimeoutError:
print(f"Timed out *AGAIN* after {terminate} seconds")
proc.send_signal(SIGTERM)
else:
return result
try:
result = await asyncio.wait_for(proc.communicate(),
timeout=kill)
except asyncio.TimeoutError:
print(f"Set phasers to kill after {kill} seconds")
proc.send_signal(SIGKILL)
else:
return result
print("Awaiting communication")
result = await proc.communicate()
return result
async def run_celeritas(system, results_dir, inp):
instance = inp['_instance']
try:
proc = await system.create_celer_subprocess(inp)
except FileNotFoundError as e:
print("File not found:", e)
return exception_to_dict(e, context="creating subprocess")
# TODO: monitor output, e.g. https://gist.github.com/kalebo/1e085ee36de45ffded7e5d9f857265d0
print(f"{instance}: awaiting communcation")
failed = False
out, err = await communicate_with_timeout(proc,
input=json.dumps(inp).encode(),
interrupt=inp['_timeout']
)
try:
result = json.loads(out)
except json.decoder.JSONDecodeError as e:
print(f"{instance}: failed to decode JSON")
failed = True
result = {
'stdout': out.decode().splitlines(),
}
if proc.returncode:
print(f"{instance}: exit code {proc.returncode}")
failed = True
result['stderr'] = err.decode().splitlines()
# Copy special inputs to output for later processing
result.setdefault('input', {}).update(
{k: v for k,v in inp.items() if k.startswith('_')}
)
try:
outdir = results_dir / inp['_outdir']
outdir.mkdir(exist_ok=True)
with open(outdir / f"{instance:d}.json", "w") as f:
json.dump(result, f, indent=0, sort_keys=True)
except Exception as e:
print(f"{instance}: failed to output:", repr(e))
failed = True
if proc.returncode:
# Write input to reproduce later
with open(outdir / f"{instance:d}.inp.json", "w") as f:
json.dump(inp, f, indent=0, sort_keys=True)
if not failed:
print(f"{instance}: success")
return result
async def main():
try:
sysname = sys.argv[1]
except IndexError:
Sys = Local
else:
# TODO: use metaclass to build this list automatically
_systems = {S.name: S for S in [Summit, Crusher, Wildstyle, Zeusstyle]}
Sys = _systems[sysname]
system = Sys()
# Copy build files
buildfile_dir = regression_dir / 'build-files' / system.name
buildfile_dir.mkdir(exist_ok=True)
for k, v in system.build_dirs.items():
shutil.copyfile(v / 'CMakeCache.txt', buildfile_dir / (k + '.txt'))
results_dir = regression_dir / 'results' / system.name
results_dir.mkdir(exist_ok=True)
device_mods = []
# if system.gpu_per_job:
# device_mods.append([use_gpu])
device_mods.append([]) # CPU
inputs = [build_input([base_input] + p + d)
for p, d in itertools.product(problems, device_mods)]
with open(results_dir / "index.json", "w") as f:
json.dump([(inp['_outdir'], inp['_name'])
for inp in inputs], f, indent=0)
summaries = {}
allstart = time.monotonic()
_num_inputs = len(inputs)
for (i, inp) in enumerate(inputs, start=1):
print("="*79)
name = inp['_outdir']
print(f"Running problem {i} of {_num_inputs}: {name}...")
start = time.monotonic()
tasks = [run_celeritas(system, results_dir, build_instance(inp, i))
for i in range(system.num_jobs)]
if not summaries:
# Only print monitoring for first instance
tasks.extend(system.get_monitoring_coro())
result = await asyncio.gather(*tasks)
# Ignore results from monitoring tasks
result = result[:system.num_jobs]
try:
summaries[name] = summary = summarize_all(result)
except Exception as e:
print("*"*79)
print("FAILED input:")
pprint(inp)
print("*"*79)
pprint(result)
print("Failed to summarize result above")
raise
summary['name'] = inp['_name'] # name tuple
pprint(summary)
alldelta = time.monotonic() - allstart
delta = time.monotonic() - start
print(f"Elapsed time for {name}: {delta:.1f} (total: {alldelta:.0f})")
with open(results_dir / 'summaries.json', 'w') as f:
json.dump(summaries, f, indent=1, sort_keys=True)
print(f"Wrote summaries to {results_dir}")
asyncio.run(main())
This issue won't be more explored because the goal is to run the profile on Perlmutter. See https://github.com/simbarras/tb23-gpu-opt-celeritas/issues/28
#!/bin/bash -e
# module load geant4-data/11.0.0-s4eo python
# Load same configuration as used to build
source ../celeritas/scripts/env/zeus.sh
echo "Running on $HOSTNAME at $(date)"
# Check argument for ncu profiling
cmd="python3 run-problems.py zeussimple"
if [ $1 == 'profiling' ]
then
echo "Profiling enabled"
#cmd="ncu -f --export=profile --set=full --launch-skip=345 --launch-count=10 --kernel-name=along_step_uniform_msc_kernel --target-processes-filter=regex:*/celer-sim $cmd"
prof="ncu"
prof="$prof -f"
prof="$prof --export=profile-$(date +%Y%m%d-2%H%M%S)"
prof="$prof --set=full"
prof="$prof --launch-skip=345"
prof="$prof --launch-count=10"
prof="$prof --kernel-name=along_step_uniform_msc_kernel"
prof="$prof --target-processes=all"
prof="$prof --target-processes-filter=regex:demo-loop"
cmd="$prof $cmd"
fi
echo "Running command: $cmd"
$cmd
echo "Completed at $(date)"
exit 0
#!/usr/bin/env python3
# Copyright 2022 UT-Battelle, LLC, and other Celeritas developers.
# See the top-level COPYRIGHT file for details.
# SPDX-License-Identifier: (Apache-2.0 OR MIT)
"""
- Loop over all problems
- Launch simultaneously on multiple cores (different seed per run!)
- Save overall times from all runs, and output from one run
- Catch failure message and save
Requires Python 3.7+.
"""
import asyncio
import itertools
import json
from pathlib import Path, PurePath
from pprint import pprint
from os import environ
import shutil
from signal import SIGINT, SIGTERM, SIGKILL
import subprocess
import sys
import time
from summarize import inp_to_nametuple, summarize_all, exception_to_dict, get_num_events_and_primaries
g4env = {k: v for k, v in environ.items()
if k.startswith('G4')}
systems = {}
class System:
name = None
build_dirs = {}
num_jobs = None # Number of simultaneous jobs to run
gpu_per_job = None
cpu_per_job = None
def create_celer_subprocess(self, inp):
try:
build = self.build_dirs[inp["_geometry"]]
except KeyError:
build = PurePath("nonexistent")
cmd = build / "app/demo-loop"
env = dict(environ)
env['OMP_NUM_THREADS'] = str(self.cpu_per_job)
if not inp['use_device']:
env['CELER_DISABLE_DEVICE'] = "1"
else:
env['CUDA_VISIBLE_DEVICES'] = str(inp['_instance'])
return asyncio.create_subprocess_exec(
cmd, "-",
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=env
)
def get_monitoring_coro(self):
return []
class Wildstyle(System):
build_dirs = {
'orange': Path("/home/s3j/.local/src/celeritas/build-reldeb"),
'vecgeom': Path("/home/s3j/.local/src/celeritas/build-reldeb-vecgeom"),
}
name = "wildstyle"
num_jobs = 2
gpu_per_job = 1
cpu_per_job = 32
class Local(System):
build_dirs = {
"orange": Path("/Users/seth/.local/src/celeritas/build"),
}
name = "testing"
num_jobs = 1
gpu_per_job = 0
cpu_per_job = 1
class Summit(System):
_CELER_ROOT = Path(environ.get('PROJWORK', '')) / 'csc404' / 'celeritas'
build_dirs = {
"orange": _CELER_ROOT / 'build-ndebug-novg',
"vecgeom": _CELER_ROOT / 'build-ndebug',
}
name = "summit"
num_jobs = 6
gpu_per_job = 1
cpu_per_job = 7
def create_celer_subprocess(self, inp):
cmd = "jsrun"
env = g4env.copy()
env["OMP_NUM_THREADS"] = str(self.cpu_per_job)
args = [
"-n1", # total resource sets
"-r1", # resource sets per host
"-a1", # tasks per resource set
f"-c{self.cpu_per_job}", # CPUs per resource set
"--bind=packed:7",
"--launch_distribution=packed",
]
if inp['use_device']:
args.append("-g1") # GPUs per resource set
else:
env["CELER_DISABLE_DEVICE"] = "1"
args.append("-g0")
args.extend("".join(["-E", k, "=", v]) for k, v in env.items())
try:
build = self.build_dirs[inp["_geometry"]]
except KeyError:
build = PurePath("nonexistent")
args.extend([
build / "app" / "demo-loop",
"-"
])
return asyncio.create_subprocess_exec(
cmd, *args,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
async def run_jslist(self):
# Wait a second for the jobs to start
await asyncio.sleep(1)
print("Running jslist")
try:
proc = await asyncio.create_subprocess_exec("jslist", "-r", "-R")
except FileNotFoundError as e:
print("jslist not found :(")
return
print("Waiting on jslist output")
await proc.communicate()
def get_monitoring_coro(self):
return [self.run_jslist()]
class Crusher(System):
_CELER_ROOT = Path(environ['HOME']) / '.local' / 'src' / 'celeritas-crusher'
build_dirs = {
"orange": _CELER_ROOT / 'build-ndebug'
}
name = "crusher"
# NOTE: layout multi-gpu run
# num_jobs = 4
# gpu_per_job = 2
# cpu_per_job = 16
num_jobs = 8
gpu_per_job = 1
cpu_per_job = 8
def create_celer_subprocess(self, inp):
cmd = "srun"
env = dict(environ)
env["OMP_NUM_THREADS"] = str(self.cpu_per_job)
args = [
f"--cpus-per-task={self.cpu_per_job}",
]
if inp['use_device']:
args.append("--gpus-per-task=1")
else:
env["CELER_DISABLE_DEVICE"] = "1"
args.append("--gpus=0")
try:
build = self.build_dirs[inp["_geometry"]]
except KeyError:
build = PurePath("nonexistent")
ncu_set = "full"
kernel_name = "along_step_uniform_msc_kernel"
problem = Path(inp["geometry_filename"]).name.split('.')[0]
filename = f"{ncu_set}-{problem}-{inp['track_order']}-{kernel_name}"
args.extend([
"ncu",
"-f",
f"--export={filename}",
f"--set={ncu_set}",
"--launch-skip=345",
"--launch-count=10",
f"--kernel-name={kernel_name}",
build / "app" / "demo-loop",
"-"
])
return asyncio.create_subprocess_exec(
cmd, *args,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=env,
)
# NOTE: Perlmutter uses Slurm so we can just inherit from Crusher
class Perlmutter(Crusher):
#_CELER_ROOT = Path(environ['CFS']) / 'atlas' / 'esseivaj' / 'devel' / 'celeritas'
_CELER_ROOT = Path(environ['HOME']) / 'project' / 'celeritas'
build_dirs = {
"vecgeom": _CELER_ROOT / 'build-ndebug',
"orange": _CELER_ROOT / 'build-ndebug-novg'
}
name = "perlmutter"
num_jobs = 1
gpu_per_job = 1
cpu_per_job = 1
class ZeusSimple(System):
_CELER_ROOT = Path(environ['HOME']) / 'project' / 'celeritas'
build_dirs = {
"vecgeom": _CELER_ROOT / 'build-ndebug',
}
name = "zeussimple"
num_jobs = 1
gpu_per_job = 1
cpu_per_job = 1
def create_celer_subprocess(self, inp):
try:
build = self.build_dirs[inp["_geometry"]]
except KeyError:
build = PurePath("nonexistent")
cmd = build / "app/demo-loop"
env = dict(environ)
env['OMP_NUM_THREADS'] = str(self.cpu_per_job)
if not inp['use_device']:
env['CELER_DISABLE_DEVICE'] = "1"
else:
env['CUDA_VISIBLE_DEVICES'] = str(inp['_instance'])
return asyncio.create_subprocess_exec(
cmd, "-",
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=env
)
class ZeusProfile(ZeusSimple):
name = "zeusprofile"
def create_celer_subprocess(self, inp):
try:
build = self.build_dirs[inp["_geometry"]]
except KeyError:
build = PurePath("nonexistent")
cmd = build / "app/demo-loop"
env = dict(environ)
env['OMP_NUM_THREADS'] = str(self.cpu_per_job)
if not inp['use_device']:
env['CELER_DISABLE_DEVICE'] = "1"
else:
env['CUDA_VISIBLE_DEVICES'] = str(inp['_instance'])
args = [
"-f",
"--export=profile",
"--set=full",
"--launch-skip=345",
"--launch-count=10",
"--kernel-name=along_step_uniform_msc_kernel",
"--replay-mode=application",
#"--target-processes=all",
build / "app" / "demo-loop",
"-",
]
return asyncio.create_subprocess_exec(
"ncu", *args,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=env
)
regression_dir = Path(__file__).parent
input_dir = regression_dir / "input"
base_input = {
"_timeout": 4 * 3600.0,
"brem_combined": False,
"initializer_capacity": 2**20,
"mag_field": [0.0, 0.0, 1.0],
"max_num_tracks": 2**12,
"max_steps": 2**21,
"track_order": "shuffled",
"secondary_stack_factor": 3.0,
"enable_diagnostics": False,
"use_device": False,
"sync": True,
# Geant options
"geant_options": {
"coulomb_scattering": False,
"rayleigh_scattering": True,
"eloss_fluctuation": False,
"lpm": True,
"em_bins_per_decade": 56,
"physics": "em_basic",
"msc": "none",
},
}
use_msc = {"geant_options": {"msc": "urban"}}
use_gpu = {
"use_device": True,
"max_num_tracks": 2**20,
"max_steps": 2**15,
"initializer_capacity": 2**26,
}
no_field = {
"mag_field": [0.0, 0.0, 0.0],
"eloss_fluctuation": True,
}
testem15 = {
"_geometry": "orange",
"_num_events": 7,
"_num_primaries": 9100,
"geometry_filename": "testem15.org.json",
"hepmc3_filename": "testem15-13TeV.hepmc3",
"physics_filename": "testem15.gdml",
"mag_field": [0.0, 0.0, 1.0],
"sync": False,
}
simple_cms = {
"_geometry": "orange",
"_num_events": 7,
"_num_primaries": 9100,
"geometry_filename": "simple-cms.org.json",
"hepmc3_filename": "simple-cms-13TeV.hepmc3",
"physics_filename": "simple-cms.gdml",
"mag_field": [0.0, 0.0, 1.0],
}
testem3 = {
"_geometry": "orange",
"geometry_filename": "testem3-flat.org.json",
"physics_filename": "testem3-flat.gdml",
"mag_field": [0.0, 0.0, 1.0],
"sync": False,
"primary_gen_options": {
"pdg": 11,
"energy": 10000, # 10 GeV
"position": [-22, 0, 0],
"direction": [1, 0, 0],
"num_events": 7,
"primaries_per_event": 1300 # 13 TeV
}
}
full_cms = {
"_geometry": "vecgeom",
"_num_events": 7,
"_num_primaries": 9100,
"geometry_filename": "cms2018.gdml",
"hepmc3_filename": "simple-cms-13TeV.hepmc3",
"physics_filename": "cms2018.gdml",
"mag_field": [0.0, 0.0, 1.0],
}
# List of list of setting dictionaries
problems = [
# [testem15, no_field],
# [testem15],
# [testem15, use_msc,
# {"_geometry": "vecgeom", "geometry_filename": "testem15.gdml"}],
# [testem15, use_msc],
# [simple_cms, no_field, use_msc],
# [simple_cms],
# [simple_cms, use_msc],
# [simple_cms, use_msc,
# {"_geometry": "vecgeom", "geometry_filename": "simple-cms.gdml"}],
# [testem3, no_field],
# [testem3, no_field,
# {"_geometry": "vecgeom", "geometry_filename": "testem3-flat.gdml"}],
# [testem3],
# [testem3, no_field, use_msc],
# [testem3, use_msc,
# {"_geometry": "vecgeom", "geometry_filename": "testem3-flat.gdml"}],
# [full_cms, no_field],
[full_cms, use_msc],
]
def recurse_updated(d, other):
result = d.copy()
result.update(other)
for k, v in result.items():
if isinstance(v, dict):
try:
orig = d[k]
except KeyError:
v = result[k]
else:
v = recurse_updated(orig, result[k])
result[k] = v
return result
def build_input(problem_dicts):
"""Construct an input dictionary by merging inputs.
Later entries override earlier entries.
"""
inp = base_input.copy()
for d in problem_dicts:
inp = recurse_updated(inp, d)
for k in inp:
if k.endswith('_filename'):
inp[k] = str(input_dir / inp[k])
inp["_name"] = name = inp_to_nametuple(inp)
inp["_outdir"] = "-".join(name)
(inp["max_events"], _) = get_num_events_and_primaries(inp)
return inp
def build_instance(inp, instance):
inp = inp.copy()
inp["_instance"] = instance
inp["seed"] = 20220904 + instance
return inp
async def communicate_with_timeout(proc, interrupt, terminate=5.0, kill=1.0, input=None):
"""Interrupt, then terminate, then kill a process if it doesn't
communicate.
"""
try:
result = await asyncio.wait_for(
proc.communicate(input),
timeout=interrupt)
except asyncio.TimeoutError:
print(f"Timed out after {interrupt} seconds: sending interrupt")
proc.send_signal(SIGINT)
else:
return result
try:
result = await asyncio.wait_for(proc.communicate(),
timeout=terminate)
except asyncio.TimeoutError:
print(f"Timed out *AGAIN* after {terminate} seconds")
proc.send_signal(SIGTERM)
else:
return result
try:
result = await asyncio.wait_for(proc.communicate(),
timeout=kill)
except asyncio.TimeoutError:
print(f"Set phasers to kill after {kill} seconds")
proc.send_signal(SIGKILL)
else:
return result
print("Awaiting communication")
result = await proc.communicate()
return result
async def run_celeritas(system, results_dir, inp):
instance = inp['_instance']
try:
proc = await system.create_celer_subprocess(inp)
except FileNotFoundError as e:
print("File not found:", e)
return exception_to_dict(e, context="creating subprocess")
with open(results_dir / "0.inp.json", 'w') as f:
json.dump(inp, f)
# TODO: monitor output, e.g. https://gist.github.com/kalebo/1e085ee36de45ffded7e5d9f857265d0
print(f"{instance}: awaiting communcation")
failed = False
out, err = await communicate_with_timeout(proc,
input=json.dumps(inp).encode(),
interrupt=inp['_timeout']
)
try:
result = json.loads(out)
except json.decoder.JSONDecodeError as e:
print(f"{instance}: failed to decode JSON")
failed = True
result = {
'stdout': out.decode().splitlines(),
}
if proc.returncode:
print(f"{instance}: exit code {proc.returncode}")
failed = True
result['stderr'] = err.decode().splitlines()
# Copy special inputs to output for later processing
result.setdefault('input', {}).update(
{k: v for k,v in inp.items() if k.startswith('_')}
)
try:
outdir = results_dir / inp['_outdir']
outdir.mkdir(exist_ok=True)
with open(outdir / f"{instance:d}.json", "w") as f:
json.dump(result, f, indent=0, sort_keys=True)
except Exception as e:
print(f"{instance}: failed to output:", repr(e))
failed = True
if proc.returncode:
# Write input to reproduce later
with open(outdir / f"{instance:d}.inp.json", "w") as f:
json.dump(inp, f, indent=0, sort_keys=True)
if not failed:
print(f"{instance}: success")
return result
async def main():
try:
sysname = sys.argv[1]
except IndexError:
Sys = Local
else:
# TODO: use metaclass to build this list automatically
_systems = {S.name: S for S in [Summit, Crusher, Wildstyle, Perlmutter, ZeusProfile, ZeusSimple]}
Sys = _systems[sysname]
system = Sys()
try:
shuffled = sys.argv[2]
except IndexError:
shuffled = "unsorted"
base_input["track_order"] = shuffled
# Copy build files
buildfile_dir = regression_dir / 'build-files' / system.name
buildfile_dir.mkdir(exist_ok=True)
for k, v in system.build_dirs.items():
if v.exists():
shutil.copyfile(v / 'CMakeCache.txt', buildfile_dir / (k + '.txt'))
results_dir = regression_dir / 'results' / system.name
results_dir.mkdir(exist_ok=True)
device_mods = []
if system.gpu_per_job:
device_mods.append([use_gpu])
# device_mods.append([]) # CPU
inputs = [build_input([base_input] + p + d)
for p, d in itertools.product(problems, device_mods)]
with open(results_dir / "index.json", "w") as f:
json.dump([(inp['_outdir'], inp['_name'])
for inp in inputs], f, indent=0)
summaries = {}
allstart = time.monotonic()
for inp in inputs:
print("="*79)
#pprint(inp)
start = time.monotonic()
tasks = [run_celeritas(system, results_dir, build_instance(inp, i))
for i in range(system.num_jobs)]
if not summaries:
# Only print monitoring for first instance
tasks.extend(system.get_monitoring_coro())
result = await asyncio.gather(*tasks)
# Ignore results from monitoring tasks
result = result[:system.num_jobs]
name = inp['_outdir']
try:
summaries[name] = summary = summarize_all(result)
except Exception as e:
print("*"*79)
print("FAILED input:")
pprint(inp)
print("*"*79)
pprint(result)
print("Failed to summarize result above")
raise
summary['name'] = inp['_name'] # name tuple
pprint(summary)
alldelta = time.monotonic() - allstart
delta = time.monotonic() - start
print(f"Elapsed time for {name}: {delta:.1f} (total: {alldelta:.0f})")
with open(results_dir / 'summaries.json', 'w') as f:
json.dump(summaries, f, indent=1, sort_keys=True)
print(f"Wrote summaries to {results_dir}")
asyncio.run(main())
To run a profile you have to be on the tag 0.2.2 and cherry pick the commit 196f72d88472e71b3816adb4a3ea90512256ebd2 , use the script run-zeus.sh
. This is an output example:
[simbarras@zeus regression]$ ./run-zeus.sh profiling
loading cuda/11.8.0
Running on zeus.lbl.gov at Wed Jun 28 08:38:35 PDT 2023
Profiling enabled
Running command: ncu -f --export=profile-20230628083835 --set=full --launch-skip=345 --launch-count=10 --kernel-name=along_step_uniform_msc_kernel --target-processes=all --target-processes-filter=regex:demo-loop python3 run-problems.py zeussimple
===============================================================================
0: awaiting communcation
==PROF== Connected to process 13078 (/bld4/home/simbarras/project/celeritas/build-ndebug/app/demo-loop)
==PROF== Profiling "along_step_uniform_msc_kernel": 0%
==WARNING== Backing up device memory in system memory. Kernel replay might be slow. Consider using "--replay-mode application" to avoid memory save-and-restore.
==WARNING== Backing up device memory in system memory. Kernel replay might be slow. Consider using "--replay-mode application" to avoid memory save-and-restore.
....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Profiling "along_step_uniform_msc_kernel": 0%....50%....100% - 35 passes
==PROF== Disconnected from process 13078
0: success
{'input': {'enable_msc': True,
'geometry_filename': 'cms2018.gdml',
'mag_field': [0.0, 0.0, 1.0],
'max_num_tracks': 1048576,
'use_device': True},
'name': ('cms2018+field+msc', 'vecgeom', 'gpu'),
'result': [{'action_times': {'along-step-uniform-msc': 1537.9388543089851,
'annihil-2-gamma': 2.4152413649999884,
'brems-rel': 2.4863914809999947,
'brems-sb': 3.6194525759999987,
'conv-bethe-heitler': 2.7413463669999985,
'geo-boundary': 8.195453996000051,
'ioni-moller-bhabha': 2.5746462819999762,
'msc-urban': 0.1052031689999999,
'photoel-livermore': 3.630558379000009,
'physics-discrete-select': 3.272606033999989,
'pre-step': 17.438818170000022,
'scat-klein-nishina': 2.9887148080000063,
'scat-rayleigh': 2.5811740850000047},
'active_hwm': {'count': 1048576, 'index': 874},
'avg_steps_per_primary': 70582.95901098901,
'avg_time_per_primary': 0.17634734310439562,
'avg_time_per_step': 2.4984407791254573e-06,
'emptying_step': 875,
'num_events': 7,
'num_primaries': 9100,
'num_step_iters': 32768,
'num_steps': 642304927,
'pre_emptying_time': 0.233108244,
'queue_hwm': {'count': 10039285, 'index': 458},
'setup_time': 32.041358836,
'slot_occupancy': 0.018693533696932718,
'total_time': 1604.76082225,
'unconverged': 2}],
'system': {'debug': False,
'geant4': '11.1.1',
'occupancy': {'along_step_uniform_msc': 0.16666666666666666,
'bethe_heitler_interact': 0.5,
'boundary': 0.16666666666666666,
'discrete_select': 0.6666666666666666,
'eplusgg_interact': 0.6666666666666666,
'init_tracks': 0.16666666666666666,
'klein_nishina_interact': 0.6666666666666666,
'livermore_pe_interact': 0.5,
'locate_alive': 1.0,
'moller_bhabha_interact': 0.6666666666666666,
'pre_step': 0.6666666666666666,
'process_primaries': 1.0,
'process_secondaries': 1.0,
'rayleigh_interact': 0.5,
'relativistic_brem_interact': 0.5,
'seltzer_berger_interact': 0.5},
'vecgeom': '1.2.2',
'version': 'v0.2.2-1+9a018b4f'}}
Elapsed time for cms2018+field+msc-vecgeom-gpu: 1638.3 (total: 1638)
Wrote summaries to /bld4/home/simbarras/project/regression/results/zeussimple
==PROF== Report: /bld4/home/simbarras/project/regression/profile-20230628083835.ncu-rep
And this is an example profile
After #26