microsoft / hcsshim

Windows - Host Compute Service Shim
MIT License
566 stars 253 forks source link

Regression: Unable to mount filesystem volumes in containerd 1.7.0 due to https://github.com/microsoft/hcsshim/pull/1344 #1699

Open hach-que opened 1 year ago

hach-que commented 1 year ago

https://github.com/microsoft/hcsshim/pull/1344 causes containerd to upgrade HostProcess jobs to silos, which in turn breaks host process containers that use filesystem drivers like WinFsp. HostProcess containers now fail to start with:

Error: failed to create containerd task: failed to create shim task: failed to bind target "\\\\?\\Volume{473e53c8-c55d-11ed-9601-ba2be9735a86}\\" to root "C:\\UnrealEngine" for job object: Do not attach the filter to the volume at this time.: unknown

in the pod events when containerd 1.7.0 is used. Previously the HostProcess containers would start successfully.

TBBle commented 1 year ago

As I understand, #1344 means https://github.com/microsoft/Windows-Containers/issues/335 affects host-process containers too now.

I saw a suggestion to make it opt-out with an annotation but I guess that didn't go anywhere.

hach-que commented 1 year ago

Correct. As of 1.7.0 it is now impossible to schedule any work that relies on filesystem drivers under containerd/Kubernetes, and HostProcess can no longer be used to "escape" the limitations of normal containers for this type of workload.

kiashok commented 6 months ago

The behavior being pointed out is not a regression. Starting with containerd/1.7, bind volume mounts are being used (instead of symlinks like in containerd/1.6) . More details about the same can be found here: https://github.com/kubernetes/enhancements/tree/master/keps/sig-windows/1981-windows-privileged-container-support#compatibility , https://github.com/kubernetes/enhancements/tree/master/keps/sig-windows/1981-windows-privileged-container-support#container-mounts .

PR https://github.com/microsoft/hcsshim/pull/1344 achieves the above behavior in containerd/1.7 by elevating the job objects to a partial silo so that we can make sure of the silo local file bindings that bind filter supports. This was a conscious decision made. Having said that, we are aware of some known issues with the approach taken for containerd/1.7+ and it is being tracked here https://github.com/microsoft/Windows-Containers/issues/366 .

I believe that what you could be facing is also due to a similar reason - that is, something is executing in host context and is probably being passed a path that is accessible only from the guest and not the host. One temporary work around for this issue is to copy the files onto the host (that is, outside of C:\hpc) and then run the container. Could you please try this workaround and let us know if it works for you? We do understand that this is not ideal and one would lose benefits of filesystem isolation etc but this is the best workaround for right now while we are looking into a more permanent fix for this approach. We will share more details about the fix once we have it.

We do not want to support an annotation on hcsshim to rollback to the old behavior to workaround this issue like in this PR: https://github.com/microsoft/hcsshim/pull/2022

cc @msscotb @fady-azmy-msft

kiashok commented 6 months ago

cc @fjs4

hach-que commented 6 months ago

The issue isn't caused by paths - 1.6 had the same issue in that you can not use or access virtual filesystem drivers inside normal containers. Even when you mirror the filesystem layout on the host it doesn't work, because it has to do with the way that silos and filesystem drivers interact in the kernel. The same root cause prevents ProjFS from running inside a silo even when it's installed on the host.

When this regression originally occurred in 1.7, I did extensively search for a workaround and one simply does not exist. Only turning off the silo itself to behave like 1.6 allows virtual filesystems to run again.

I believe there's a related issue that prevents virtual filesystems from working at runtime (instead of at container creation, which is what the error in the original post shows). If you mount a virtual filesystem into a host process container after the container is created, it still doesn't work because - as far as I can tell - bindflt (or the filter that handles file access inside silos) doesn't support re-entering the filter pipeline for it's own file access and just does file access directly. This means containers can't see the files inside a mounted path even if you do get the mount folder itself to appear.

hach-que commented 6 months ago

Sidenote: There's an argument that the fix should be to make silos work with virtual filesystems / filter drivers, but the Windows kernel team does not plan on implementing support for this, so we need an option at the hcsshim/containerd level instead.

kiashok commented 6 months ago

@hach-que we are continuing to take a look at this at our end. Will share an update as soon as we have something to share.

hach-que commented 5 months ago

@kiashok @ntrappe-msft Is there any update here? I would really like to not be stuck on 1.6 forever.

kiashok commented 4 months ago

@hach-que sorry there were some higher priority items I had to focus on and got side tracked. I'll get back to you one this. Do you have a repro that you can share? Btw, an unrelated question - this whole scenario works on 1.7 if your app is run directly on the host, correct? Is there any motivation for containerizing the application? So you have a virtual filesystem on the host and you are trying to access some files from there inside of your container - right? I have never looked into virtuL file systems with containers. Will revert back soon.

hach-que commented 4 months ago

@kiashok So we can use daemonsets to deploy updates across a fleet of servers without having to roll our own centralized updating mechanism.

I would also like to properly containerize eventually, but this relies on the Windows Kernel team fixing filesystem filters in silos (the same issue that prevents HostProcess jobs from being silos without breaking stuff).

kiashok commented 4 months ago

@hach-que I was trying to repro this locally and I am not able to repro the issue you are reporting. let me know if I am missing anything. it would be great if you can share a simple repro if you have one.

Running projFS locally on my machine using https://github.com/Microsoft/Windows-classic-samples/tree/main/Samples/ProjectedFileSystem . Then created an HPC and tried accessing the files from the folder where projFS app has been projected into and it works just fine!

Have you tried with projFS previously? Is the issue you are hitting only with WinFsp?

kiashok commented 4 months ago

(the same issue that prevents HostProcess jobs from being silos without breaking stuff).

could you elaborate on what you mean when you say "(the same issue that prevents HostProcess jobs from being silos without breaking stuff)." ? I don't think I am aware of any issues that is preventing elevating HostProcess jobs to silos. Containerd/1.7 + does this for HPC.

kiashok commented 4 months ago

@hach-que I was trying to repro this locally and I am not able to repro the issue you are reporting. let me know if I am missing anything. it would be great if you can share a simple repro if you have one.

Running projFS locally on my machine using https://github.com/Microsoft/Windows-classic-samples/tree/main/Samples/ProjectedFileSystem . Then created an HPC and tried accessing the files from the folder where projFS app has been projected into and it works just fine!

Have you tried with projFS previously? Is the issue you are hitting only with WinFsp?

I was also able to run the same projFS application as an HPC container and project onto a file in C: drive. I was able to mount this folder onto another container and access files from there.