ubccr / coldfront

HPC Resource Allocation System
https://coldfront.readthedocs.io
GNU General Public License v3.0
101 stars 80 forks source link

Slurm sync #375

Open payerle opened 2 years ago

payerle commented 2 years ago

This PR updates the Slurm plugin to add a slurm_sync command for finer-grained control on synchronizing Slurm with ColdFront. It also sowa aomw restructuring of the SlurmCluster, etc. classes to facilitate subclassing.

  1. Changes to SlurmCluster, SlurmAccount, SlurmUser to facilitate subclassing them. This is to allow for sites with non-standard requirements to subclass and just override the requisite routines.

    1. Various static methods (SlurmCluster.new_from_stream, SlurmCluster.new_from_resource, SlurmAccount.new_from_sacctmgr, SlurmUser.new_from_sacctmgr) have been converted to class methods. The calling conventions have not changed.
    2. Class/static variables SlurmAccount_class and SlurmUser_class have been added to SlurmCluster, and SlurmUser_class to SlurmAccount which are set to SlurmAccount and SlurmUser, respectively. These allow subclasses to override them so the new_from_stream, etc. class methods create instances of the subclasses.
    3. The order of class definitions for SlurmCluster, SlurmAccount, and SlurmUser in the file has been changed to better allow for the SlurmAcount_class/SlurmUser_class variable defaults. Now SlurmUser is defined first, then SlurmAccount and SlurmCluster.
  2. Rudimentary support for Slurm accounts having other Slurm accounts as parents has been added. This is done somewhat cheaply; if an Allocation has an attribute slurm_parent_account_name set, the Slurm plugin will use that as the parent. This parentage is purely in Slurm only, and does not change anything in ColdFront allocation structure.

    1. utils.py adds SLURM_ACCOUNT_PARENT_ATTRIBUTE_NAME, defaulting to slurm_parent_account_name, to abstract the attribute name.
    2. SlurmAccount has a new instance variable parent, and init has a new optional argument parent, which defaults to None.
    3. SlurmAccount.new_from_sacctmgr has new optional paremeter parent, which defaults to None, and is passed to init.
    4. SlurmCluster.write updated to order accounts so parents get written before children.
  3. A slurm_sync command added, to issue individual Slurm commands to make existing Slurm config (from an sacctmgr dump flat file) match the ColdFront configuration. The command compares an existing Slurm config (via Slurm flat file dumps) to desired (i.e. ColdFront) config, and outputs sacctmgr commands needed to make the Slurm config match the ColdFront config. It accepts a number of flags to control behavior (e.g. skipping account/user creation/deletion, or ignoring certain slurm_spec fields or subfields in comparisons).

    1. Add methods in utils.py for adding/modifying clusters, accounts, and users.
    2. SlurmBase: spec_list and format_specs take an optional specs argument. If omitted, defaults to self.specs, so calls w/out the optional argument will behave as previously. This allows the routines to act on spec strings other than the instance's spec data member.
    3. SlurmBase given new methods strip_spec_value, spec_dict, compare_slurm_specs, and spec_dict_to_list to help deal with specs and the comparison of specs.
    4. SlurmUser, SlurmAccount, and SlurmCluster gain the class methods update_user_to, update_account_to, and update_cluster_to. These take an old and new instance of the SlurmUser/Account/Cluster, and will compare the old and new values, and issue sacctmgr commands to turn old into new as needed. These also take a list of flag strings which can be used to fine-tune the behavior.
    5. A command slurm_sync is added which takes a sacctmgr flat dump file (or generates one) for the existing Slurm config, and compares that to what ColdFront thinkgs it should be, and issues sacctmgr commands to synchronize Slurm with ColdFront.
  4. Regression test added for slurm-sync stuff. I could not get the previous test_associations.py script to work even in master w/out my modifications, so the Slurm tests will fail (I have been renaming test_associations.py script in my working branch to ensure the new tests work).

  5. Documentation has been updated to discuss the slurm_sync command (both README and more detail in Slurm setup page)

aebruno commented 2 years ago

@payerle Thanks for the PR. We're going to test this out and target the v1.2.0 release.