Create job management classes to handle large numbers of pipeline runs

sweverett commented 2 years ago

It has been annoying to create lots of very similar configs by-hand, and to have each of us run subsets of various cluster (m,z)'s and realizations for the same run_name.

If we think of each pipeline run as a "job", we should create a JobsManager and ClusterJob class which handle all of the bookkeeping if passed a single yaml file with the following things:

run_name
base_dir (absolute top level directory of all clusters and realizations for a given run, e.g. 4 different (m,z) clusters w/ 3 realizations each)
nfw_dir (top level directory of all NFW truth files, assumed to have same directory structure as the outputs)
gs_base_config (mock_superbit_data.py config file that sets the simulation type, just without a few things like mass, redshift, seeds, etc.)
mass_bins (list of unique cluster mass values)
z_bins (list of unique cluster redshift values)
realizations (either # of realizations or a list of realization values; this allows you to run say the first 5 realizations while i run 6-10)

The JobsManager should then:

Create the necessary directory structure for all runs and outputs, like I posted
Create the job-specific GalSim config for mock generation
Create the job-specific pipeline config file

Then doing a full run should only require a tiny top-level script that distributes all the desired pipe jobs on our local HPC environment

I will work on this in the existing job-configs branch. Supersedes #28

mcclearyj commented 2 years ago

This would be a great enhancement and ease both our workflow and the workflow of future users.

On Wed, Apr 27, 2022 at 12:03 PM Spencer Everett @.***> wrote:

It has been annoying to create lots of very similar configs by-hand, and to have each of us run subsets of various cluster (m,z)'s and realizations for the same run_name.

If we think of each pipeline run as a "job", we should create a JobsManager and ClusterJob class which handle all of the bookkeeping if passed a single yaml file with the following things:

run_name

base_dir (absolute top level directory of all clusters and realizations for a given run, e.g. 4 different (m,z) clusters w/ 3 realizations each)

nfw_dir (top level directory of all NFW truth files, assumed to have same directory structure as the outputs)

gs_base_config (mock_superbit_data.py config file that sets the simulation type, just without a few things like mass, redshift, seeds, etc.)

mass_bins (list of unique cluster mass values)

z_bins (list of unique cluster redshift values)

realizations (either # of realizations or a list of realization values; this allows you to run say the first 5 realizations while i run 6-10)

The JobsManager should then:

Create the necessary directory structure for all runs and outputs, like I posted

Create the job-specific GalSim config for mock generation

Create the job-specific pipeline config file

Then doing a full run should only require a tiny top-level script that distributes all the desired pipe jobs on our local HPC environment

Supersedes #28 https://github.com/superbit-collaboration/superbit-metacal/issues/28

— Reply to this email directly, view it on GitHub https://github.com/superbit-collaboration/superbit-metacal/issues/62, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADLXI3V65OONHJBTCQ7UB7LVHFQMFANCNFSM5UPTYXVA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

sweverett commented 2 years ago

So there is currently a jobs.ClusterJob.make_gs_config() method that updates a base GS config with realization-specific config options such as the GS master seed. I'm pretty sure this is where the stellar density is set as well in your current GAIA methods, so this should be a simple place to add this capability if we end up going down that path.

sweverett commented 2 years ago

Completed by #63

superbit-collaboration / superbit-metacal

Create job management classes to handle large numbers of pipeline runs #62