intel / cri-resource-manager

Kubernetes Container Runtime Interface proxy service with hardware resource aware workload placement policies
Apache License 2.0
176 stars 57 forks source link

Support for Ubuntu 24.04 #1106

Closed marquiz closed 2 months ago

askervin commented 2 months ago

It seems e2e tests fail on Ubuntu 24.04 due to container stop failure. From cri-resmgr.output.txt in vm:

E: [resource-manager] StopContainer: failed to stop container pod0:pod0c1: rpc error: code = Unknown desc = failed to stop container "97e4401acd43cba20543030364eb9986d579a886d5c9af5973cda5f2a3a24ae8": unknown
 error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied

I don't know if it has anything to do with cri-resmgr. Fails both topology-aware and balloons tests in the first kubectl delete pod ... command.

askervin commented 2 months ago

Approved as it has nothing to do with this patch, either. I'm fine merging this patch and debugging/fixing the issue later.

There is no problem in latest Ubuntu 22.04.

klihub commented 2 months ago

It seems e2e tests fail on Ubuntu 24.04 due to container stop failure. From cri-resmgr.output.txt in vm:

E: [resource-manager] StopContainer: failed to stop container pod0:pod0c1: rpc error: code = Unknown desc = failed to stop container "97e4401acd43cba20543030364eb9986d579a886d5c9af5973cda5f2a3a24ae8": unknown
 error after kill: runc did not terminate successfully: exit status 1: unable to signal init: permission denied

I don't know if it has anything to do with cri-resmgr. Fails both topology-aware and balloons tests in the first kubectl delete pod ... command.

I'm guessing something SELinux-related, maybe... Probably failure to disable or set to permissive mode during node spinup ?

marquiz commented 2 months ago

I'm guessing something SELinux-related, maybe

Does the latest ubuntu have SELinux enabled?

klihub commented 2 months ago

I'm guessing something SELinux-related, maybe

Does the latest ubuntu have SELinux enabled?

Argh... No, sorry, of course it does not. I was still in the context of the previous fedora-related PR I reviewed.

klihub commented 2 months ago

But this might be the cause: https://github.com/containerd/containerd/issues/10542

marquiz commented 2 months ago

Rebased. PTAL @klihub