superbit-collaboration / superbit-metacal

Contains a collection of routines used to perform gmix/metacalibration on simulated SuperBIT images
4 stars 1 forks source link

Create job management classes to handle large numbers of pipeline runs #62

Closed sweverett closed 2 years ago

sweverett commented 2 years ago

It has been annoying to create lots of very similar configs by-hand, and to have each of us run subsets of various cluster (m,z)'s and realizations for the same run_name.

If we think of each pipeline run as a "job", we should create a JobsManager and ClusterJob class which handle all of the bookkeeping if passed a single yaml file with the following things:

The JobsManager should then:

  1. Create the necessary directory structure for all runs and outputs, like I posted
  2. Create the job-specific GalSim config for mock generation
  3. Create the job-specific pipeline config file

Then doing a full run should only require a tiny top-level script that distributes all the desired pipe jobs on our local HPC environment

I will work on this in the existing job-configs branch. Supersedes #28

mcclearyj commented 2 years ago

This would be a great enhancement and ease both our workflow and the workflow of future users.

On Wed, Apr 27, 2022 at 12:03 PM Spencer Everett @.***> wrote:

It has been annoying to create lots of very similar configs by-hand, and to have each of us run subsets of various cluster (m,z)'s and realizations for the same run_name.

If we think of each pipeline run as a "job", we should create a JobsManager and ClusterJob class which handle all of the bookkeeping if passed a single yaml file with the following things:

  • run_name
  • base_dir (absolute top level directory of all clusters and realizations for a given run, e.g. 4 different (m,z) clusters w/ 3 realizations each)
  • nfw_dir (top level directory of all NFW truth files, assumed to have same directory structure as the outputs)
  • gs_base_config (mock_superbit_data.py config file that sets the simulation type, just without a few things like mass, redshift, seeds, etc.)
  • mass_bins (list of unique cluster mass values)
  • z_bins (list of unique cluster redshift values)
  • realizations (either # of realizations or a list of realization values; this allows you to run say the first 5 realizations while i run 6-10)

The JobsManager should then:

  1. Create the necessary directory structure for all runs and outputs, like I posted
  2. Create the job-specific GalSim config for mock generation
  3. Create the job-specific pipeline config file

Then doing a full run should only require a tiny top-level script that distributes all the desired pipe jobs on our local HPC environment

Supersedes #28 https://github.com/superbit-collaboration/superbit-metacal/issues/28

— Reply to this email directly, view it on GitHub https://github.com/superbit-collaboration/superbit-metacal/issues/62, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADLXI3V65OONHJBTCQ7UB7LVHFQMFANCNFSM5UPTYXVA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

sweverett commented 2 years ago

So there is currently a jobs.ClusterJob.make_gs_config() method that updates a base GS config with realization-specific config options such as the GS master seed. I'm pretty sure this is where the stellar density is set as well in your current GAIA methods, so this should be a simple place to add this capability if we end up going down that path.

sweverett commented 2 years ago

Completed by #63