kata-containers / kata-containers

Kata Containers is an open source project and community working to build a standard implementation of lightweight Virtual Machines (VMs) that feel and perform like containers, but provide the workload isolation and security advantages of VMs. https://katacontainers.io/
Apache License 2.0
5.09k stars 1.01k forks source link

Pod creation fails with CRI-O on kata-qemu runtime #9878

Open visheshtanksale opened 1 week ago

visheshtanksale commented 1 week ago

Description of problem

Setup Kata using kata deploy on CRI-O. When I create a pod using

$ kubectl apply -f https://raw.githubusercontent.com/kata-containers/kata-containers/main/tools/packaging/kata-deploy/examples/test-deploy-kata-qemu.yaml

The pod does not come up it is stuck in CreateContainerError status

Logs show this error

Jun 19 08:21:48 ipp1-1848 kata[30819]: time="2024-06-19T08:21:48.778103019Z" level=error msg="createContainer failed" error="rpc error: code = Internal desc = the file apache2-foreground was not found" name=containerd-shim-v2 pid=30819 sandbox=db4fca6b206a7c224496485f2222483281d2e369f6edb90a15473cea24d954db source=virtcontainers subsystem=kata_agent

Attached complete logs here

If I try to bring up any other container using kata-qemu runtime I get similar error that the command which is entrypoint of the container is not found

Expected result

The pod should come up without error

Actual result

The pod does not come up it is stuck in CreateContainerError status

Further information

Attached kata-collect-data.sh output here

Kata Containers survey

Please consider taking the survey to help us help you: https://openinfrafoundation.formstack.com/forms/kata_containers_user_survey

zvonkok commented 1 week ago

@littlejawa We're baffled why we cannot start any container with CRIO and Kata. Any command: that we've tried ends in command not found PTAL.

littlejawa commented 1 week ago

Not sure if it's related, but it sounds similar to what we fixed in the CI with https://github.com/kata-containers/kata-containers/pull/9206

Can you double-check the crio config (in /etc/crio/crio.conf, and any files under /etc/crio/crio.conf.d/), and make sure that you have something like:

[crio]
  storage_option = [
    "overlay.skip_mount_home=true",
  ]
visheshtanksale commented 1 week ago

@littlejawa I didnt have the storage option param. But adding that doesnt help. Still hitting the same error

littlejawa commented 1 week ago

Sounds weird because this is the exact same symptom and situation. Did you reload crio after adding it to the conf?

littlejawa commented 1 week ago

I have a tentative fix for this in kata-deploy - it needs to set that flag as part of crio config. Waiting on your feedback before pushing it, in case something else needs to be fixed.

visheshtanksale commented 1 week ago

@littlejawa I did reload crio service. This is the config change

# cat /etc/crio/crio.conf.d/99-kata-deploy | grep -A 3 -B 1 storage_option
[crio]
storage_option = [
    "overlay.skip_mount_home=true",
]

I am still seeing the same error. Do you think there might any other reason for this issue?

littlejawa commented 1 week ago

I don't remember seeing this kind of error, except with this config issue :-( Can you get an updated crio.log and kata-collect-data.sh output, to see if crio complains about the new setting somehow? Maybe it can tell us why it's not taking it into account.

visheshtanksale commented 1 week ago

I tried to completely wipe the CRIO storage and test it again. But no success. I dont see anything obvious in the CRIO logs

Attached kata-collect-data.sh output here CRIO Logs here Kata logs here

littlejawa commented 1 week ago

I don't see anything obvious either :-(

Comparing with my working setup, there is one thing that is different: the default for the "storage_driver" is empty in your crio config, and it is "overlay" for me. This should not change anything, as the entry is commented out anyway. But I'm wondering if you're actually using the overaly driver? If not, the option we modified might have no effect.

Can you verify the content of your /etc/containers/storage.conf file, and check which driver is used by default? (should be at the very beginning of the file).

Alternatively, can you uncomment the line storage_driver = "" in your crio.conf file, and make it : storage_driver = "overlay"?

visheshtanksale commented 1 week ago

I did try that. Its coming up with appropriate configs.

Current CRI-O configuration:\n[crio]\n  root = \"/var/lib/containers/storage\"\n  runroot = \"/run/containers/storage\"\n  imagestore = \"\"\n  storage_driver = \"overlay\"\n  storage_option = [\"overlay.skip_mount_home=true\"]\n 

Its not helping with issue.

Also I am running ubuntu 22.04

# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jamm

Are you on RHEL? May be that might be causing different behaviors?

littlejawa commented 5 days ago

Are you on RHEL? May be that might be causing different behaviors?

Most of my testing is done on Ubuntu 22.04.4, but without kubernetes - just crio + kata. My version of cri-o is probably a bit more recent because I'm building it from main, but that shouldn't count, because I've been testing with this kind of setup for a very long time, with all previous versions of crio.

From what I can tell, the only difference I have with your setup is that I didn't use kata-deploy. I can see that kata deploy doesn't set that flag on crio, and I could test that without that flag I get the exact same problem that you have... I just tested it again this morning to be sure.

At this point, I really don't know :-(

littlejawa commented 5 days ago

@wainersm @fidencio, Sorry for the ping, but I'm lost here, and you have some more experience than me with kubernetes testing :-(

@visheshtanksale has a Kubernetes cluster using crio, setup with kata-deploy. They get an error on every pod creation saying the entrypoint for the pod is "not found". kata-deploy doesn't set the "skip_mount_home" flag for crio, so I made them change that setting... and it doesn't solve the problem :-(

Any idea what else could cause the same symptom?

visheshtanksale commented 4 days ago

Are you on RHEL? May be that might be causing different behaviors?

Most of my testing is done on Ubuntu 22.04.4, but without kubernetes - just crio + kata. My version of cri-o is probably a bit more recent because I'm building it from main, but that shouldn't count, because I've been testing with this kind of setup for a very long time, with all previous versions of crio.

From what I can tell, the only difference I have with your setup is that I didn't use kata-deploy. I can see that kata deploy doesn't set that flag on crio, and I could test that without that flag I get the exact same problem that you have... I just tested it again this morning to be sure.

At this point, I really don't know :-(

Can you share how does your kataruntime config look like?

littlejawa commented 3 days ago

Here it is : configuration-qemu.toml

I don't think I modified it from the default, except the debug log level.

littlejawa commented 3 days ago

I realize that you were asking the cri-o config for kata maybe? Here it is, just in case. Again, nothing different.

[crio.runtime.runtimes.kata]
  runtime_path = "/opt/kata/bin/containerd-shim-kata-v2"
  runtime_root = "/run/vc"
  runtime_type = "vm"
  privileged_without_host_devices = true
  runtime_config_path = "/opt/kata/share/defaults/kata-containers/configuration-qemu.toml"
  runtime_pull_image = false

[crio.runtime.runtimes.kata-remote]
  runtime_path = "/opt/kata/bin/containerd-shim-kata-v2"
  runtime_root = "/run/vc"
  runtime_type = "vm"
  privileged_without_host_devices = true
  runtime_config_path = "/opt/kata/share/defaults/kata-containers/configuration-remote.toml"
  runtime_pull_image = true
visheshtanksale commented 2 days ago

@littlejawa How do you install kata on your setup? I am trying to figure out whats the difference between your setup and the kata-deploy setup.

littlejawa commented 2 days ago

I'm retrieving the release archive from https://github.com/kata-containers/kata-containers/releases Specifically: I'm currently testing 3.4.0

I'm just unpacking it to /, so the folder structure is what the tarball contains.

Then I configure crio manually, by adding the entries that I posted above. I'm also adding the following in crio's conf:

# Set a flag in crio settings to avoid private bind mount
[crio]
storage_option = [
  "overlay.skip_mount_home=true",
]

# Set debug logs in crio
[crio.runtime]
log_level = "debug"

I'm using separate conf files under /etc/crio/crio.conf.d/ for my config: one for the runtime definition, and one for the storage and debug flags.

And finally, I edit the .toml files for kata config, to enable debug logs.