riga / law

Build large-scale task workflows: luigi + job submission + remote targets + environment sandboxing using Docker/Singularity
http://law.readthedocs.io
BSD 3-Clause "New" or "Revised" License
96 stars 39 forks source link

Add htcondor group submission. #176

Closed riga closed 3 months ago

riga commented 3 months ago

This PR mainly contains the group submission of htcondor jobs (detailed below). However, there are two smaller features that coming with this PR:

Smaller features

  1. law luigid sub command: The law configuration file can contain sections that synced with the internal luigi config (via naming sections [luigi_<section>]), which is a convenience to just maintain a single file. However, when starting the central scheduler with luigid, these sections are not considered. The new law luigid sub command loads and synchronizes the config and then starts the central scheduler, just like plain luigid would do.
  2. Interface to declare job resource as luigi's process_resources in remote workflows: Remote workflows like HTCondorWorkflow can now declare the resources that their jobs claim to the central luigi scheduler (if used). During status polling, these resources are freed eagerly so that new tasks relying on the same resources can start as soon as possible.

Main feature

So far, htcondor jobs prepared in separate job files, each with their own queue command (though, merged into a single submission file for faster interactions). However, since the central job interface also provides grouped jobs (such as used by the CMS-CRAB workflow interface), htcondor jobs can actually make use of that feature, too.

The change required a rather large restructuring of how the htcondor itself treats jobs during their creating and - obviously - how the HTCondorJobFileFactory creates its outputs.

There is a config - job::htcondor_job_grouping_submit, defaulting to True - which controls the submission type. Grouping is now the default. Setting this option to False will trigger the old behavior.


Requires a port to the release_prep branch.