stackhpc / ansible-slurm-appliance

A Slurm-based HPC workload management environment, driven by Ansible.
43 stars 18 forks source link

Support manila fileshares (cephfs) #344

Closed sjpb closed 8 months ago

sjpb commented 9 months ago
  1. Adds general support for multiple manila fileshares using CephFS protocol. See groups, group_vars and role for manila.

  2. In .caas environment for Azimuth: a. Storage for home directories is now configurable using bool cluster_home_manila_share:

    • If true, a manila share is created and deleted with the cluster. In this case the share type must be defined with cluster_home_manila_share_type unless the a default share type is defined in OpenStack.
    • If false, an OpenStack volume attached to control node and NFS exported to cluster (pre-PR behaviour)

    In both cases home_volume_size defines the size (in GB) of the relevant storage

    b. A project share can be enabled by setting bool cluster_project_manila_share: true. This is expected to already exist (i.e. its lifecycle is not tied to the cluster). Its name is defined by cluster_project_manila_share_name, defaulting to azimuth-project-share. See environments/.caas/inventory/group_vars/all/manila.yml.

  3. In skeleton terraform (used by new cookiecutter environments and stackhpc CI environment): a. home_volume_size can be set to 0 to not create a home volume. This allows e.g. a manila share to be externally defined or defined in additional terraform b. Manila client (default nautilus version) installed in fat image.

sjpb commented 9 months ago

Checked share works ok after reboot

sjpb commented 8 months ago

Fat image build w/ (default nautilus) manila client: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/7543193442

sjpb commented 8 months ago

Fat image build: https://github.com/stackhpc/ansible-slurm-appliance/actions/runs/7544263443 openhpc-240116-1604-b3563a08

sjpb commented 8 months ago

Example additional config on arcus rcp-cloud-portal-demo:

azimuth_caas_cluster_templates_overrides:
  slurm-manila:
    gitUrl: https://github.com/stackhpc/ansible-slurm-appliance.git
    gitVersion: feat/manila
    uiMetaUrl: https://raw.githubusercontent.com/stackhpc/ansible-slurm-appliance/feat/manila/environments/.caas/ui-meta/slurm-infra-manila.yml
    playbook: ansible/site.yml
    extraVars:
      cluster_image: 6a7b4a9c-b142-4a35-8590-d8950d28123c # "openhpc-240116-1604-b3563a08" # https://github.com/stackhpc/ansible-slurm-appliance/pull/344
      login_flavor_name: vm.ska.cpu.general.small
      control_flavor_name: vm.ska.cpu.general.small
      cluster_state_volume_size: 40
      cluster_home_manila_share: true
      cluster_home_manila_share_type: ceph01_cephfs # no default defined in arcus rcp-cloud-portal-demo at least
      cluster_project_manila_share: true
    envVars:
      # Normally set through environment's activate script:
      ANSIBLE_INVENTORY: environments/common/inventory,environments/.caas/inventory # NB: Relative to runner project dir