microsoft / hcsshim

Windows - Host Compute Service Shim
MIT License
564 stars 253 forks source link

feature: block-device mounts #2168

Closed anmaxvl closed 2 months ago

anmaxvl commented 2 months ago

This PR adds capability to mount virtual and passthrough disks as block devices inside containers.

We add a new "blockdev://" prefix to OCI Mount.ContainerPath, which indicates that the source should be mounted as a blcok device.

A new BlockDev field has been added to mountConfig used by mountManager, which indicates that the SCSI attachment should be mounted as a block device.

The GCS has also been updated to handle BlockDev. Instead of mounting the filesystem, GCS creates a symlink to the block device corresponding to the SCSI attachment. The symlink path is set by shim as a source of bind mount in OCI container spec. GCS resolves the symlink and adds the corresponding device cgroup. Without the cgroup, the container won't be able to work with the block device.

We chose a symlink approach instead of bind mounting the device directly, because the shim doesn't know the path at which the device will appear inside UVM. For this to work, we either need to encode the SCSI controller/LUN in the OCI mount's HostPath or update the communication protocol between the shim and GCS, where GCS would either return the device path, or add capability for the shim to query for it.

Below are some CRI container config examples for physical and virtual disks:

Passthrough physical disk:

{
    ...
    "mounts": [
        {
            "host_path": "\\\\.\\PHYSICALDRIVE1",
            "container_path": "blockdev:///my/block/mount",
            "readonly": false
        }
    ]
    ...
}

Virtual VHD disk:

{
    ...
    "mounts": [
        {
            "host_path": "C:\\path\\to\\my\\disk.vhdx",
            "container_path": "blockdev:///my/block/mount",
            "readonly": false
        }
    ]
    ...
}

Mount manager will differentiate between a block device and a filesystem mount. Two containers can use the same managed disk inside UVM as a block device or filesystem at the same time. For block device mount a symlink will be created, for filesystem mount the block device will be mounted in the UVM.

bash-5.0# ls -l /run/mounts/scsi/
total 16
drwxr-xr-x    3 root     root          4096 Jan  1  1970 m0
drwxr-xr-x    4 root     root          4096 Jun 20 23:20 m1
drwxr-xr-x   18 root     root          4096 Jan  1  1970 m2
drwxr-xr-x    3 root     root          4096 Jun 20 23:20 m3
lrwxrwxrwx    1 root     root             8 Jun 20 23:22 m4 -> /dev/sde
bash-5.0# mount | grep sde
/dev/sde on /run/mounts/scsi/m3 type ext4 (rw,relatime)