lsst-uk / somerville-operations

User issue reporting and tracking for the Somerville Cloud
0 stars 0 forks source link

Ephemeral mounts for NVMe data drives #147

Open Zarquan opened 5 months ago

Zarquan commented 5 months ago

In order to test the performance of the NVMe data drives, how can we import them into our Openstack VMs ?

Ideally it would be good to mount them as separate discs in the VMs, but I'm not sure that would be possible.

To start with can we create some VM flavors that have large (~900G byte) ephemeral discs that are mapped onto the the NVMe data drives.

   {
    "ID": "....",
    "Name": "gaia.vm.26vcpu.916nvme",
    "RAM": 44032,
    "Disk": 20,
    "Ephemeral": 916,
    "VCPUs": 26
  }
GregBlow commented 5 months ago

Good morning,

These were isntalled on the supermicro hypervisors. There were a few mechanisms considered for how to properly segregate these, including availability zoning. However it looks likely the better way is to deploy as ephemeral volumes on feature-restricted flavors.

Can you let us know the flavors of the VMs you intend to test these on please?

Zarquan commented 5 months ago

I agree that special flavors is probably the easiest way to manage them.

Colleagues from STFC cloud at RAL recommended we should try out Longhorn to aggregate ephemeral storage from a set of VMs to create a large storage volume that can be mounted in Kubernetes as a persistent volume.

I'd like to test two different scenarios:

1) Adding the nvme discs to a special version of the 26 core flavor that is used for the Spark workers. Handling the shared data store on the same VMs as the Spark workers.

   {
    "ID": "....",
    "Name": "gaia.vm.26vcpu.916nvme",
    "RAM": 44032,
    "Disk": 20,
    "Ephemeral": 916,
    "VCPUs": 26
  }

2) Adding the ephemeral discs to a special version of the 4 core flavor that we can use to create a separate set of VMs specifically for handling the shared data.

{
    "ID": "....",
    "Name": "gaia.vm.4vcpu.916nvme",
    "RAM": 6144,
    "Disk": 22,
    "Ephemeral": 916,
    "VCPUs": 4
  },
GregBlow commented 4 months ago

Good afternoon,

I have reconfigured the systems hypervisors with a new variety of labelling that allows locking of flavours to specific hypervisors and provided the two new flavours specified.

It's an experimental system, so might not behave precisely as it should and is subject to change, though preliminary tests look good. Could you please test and verify?

Regards,

Greg

GregBlow commented 4 months ago

(scratch that, flavours very like the ones you asked for work. Trying to differentiate.)

GregBlow commented 4 months ago

oh, ephemeral volumes of 916GB will not work. The hypervisors have 786GB SSDs.

GregBlow commented 4 months ago

I've added a new set of flavours with 768GB ephemeral volumes configured. Easy to create more if you'd like different configurations.

Zarquan commented 4 months ago

Thanks setting it up. I have some work to finish on the Cambridge Arcus system, but I hope to get a chance to experiment with Longhorn.

GregBlow commented 4 months ago

note: there are 4 hypervisors with these SSDs mounted that are presently for your exclusive use. If your experiments require a larger number of volumes we'll need smaller flavours provisioned.

Zarquan commented 4 months ago

Unable to experiment with the new flavors due to issues with the platform. See #144

GregBlow commented 1 week ago

@DP-B21 Can you try creating instances with each of the two new flavours (they're locked to the Gaia project, you'll need to create under that project) and see if they correctly place on the supermicro hypervisors please?

GregBlow commented 22 hours ago

has ceased to work (confirmed by myself and @DP-B21 , flavour gaia-4vcpu-916nvme goes to error)