k8snetworkplumbingwg / sriov-network-device-plugin

SRIOV network device plugin for Kubernetes
Apache License 2.0
409 stars 177 forks source link

Make SRIOV network device plugin independent of CNI #549

Open aojea opened 7 months ago

aojea commented 7 months ago

What would you like to be added?

Remove the dependency on CNI plugins using NRI plugins https://github.com/containerd/nri/pull/119

What is the use case for this feature / enhancement?

Depending on CNI adds complexity, since SRIOV plugin already controls the whole lifecycle of the devices it can use NRI plugins and its own APIs to handle the attachment of the additional devices to the Pods

I've presented and demoed how to use NRI to add network interfaces to pod using DRA in https://docs.google.com/presentation/d/1Vdr7BhbYXeWjwmLjGmqnUkvJr_eOUdU0x-JxfXWxUT8/edit#slide=id.p

https://github.com/aojea/dra-network-driver-template https://github.com/aojea/kubernetes-network-driver

SchSeba commented 7 months ago

Hi @aojea, Thanks for this really interesting topic!

I check the presentation also video and the code.

there is only one point that I was not able to find about the deletion of the pod are we able to run some work in that case? for example change the internal name back/mac address/remove IP address and other stuff

aojea commented 7 months ago

Good question, let me figure that out

aojea commented 7 months ago

@SchSeba https://github.com/opencontainers/runtime-spec/blob/main/config.md#poststop

The demo and code is just for PoC and feasibility purposes, as I said, I'm happy to collaborate on this , feel free to reach out in slack kubernetes @aojea or via email

SchSeba commented 7 months ago

This is really nice!

will you be able to talk about it on our next community meeting? you can find it in the k8smeet@gmail.com calendar its the "K8s Network & Resource Management WG Tech discussion" meeting

zshi-redhat commented 7 months ago

@aojea Is the goal here to eliminate the use of sriov-cni or just handle the movement of sriov interface to container namespace in sriov device plugin (still use sriov-cni for interface configurations, such as IP/MAC/spoofCheck/linkAuto etc)?

aojea commented 7 months ago

@aojea Is the goal here to eliminate the use of sriov-cni or just handle the movement of sriov interface to container namespace in sriov device plugin (still use sriov-cni for interface configurations, such as IP/MAC/spoofCheck/linkAuto etc)?

I'm looking for improvements on the user experience and reduce the complexity of the existing solutions, not familiar with all the SRIOV use cases , but my feeling is that you can define this configurations in one place and avoid spreading the responsibilities over different components ... this will turn out in a better UX and improve troubleshooting and supportability

zshi-redhat commented 7 months ago

@aojea Is the goal here to eliminate the use of sriov-cni or just handle the movement of sriov interface to container namespace in sriov device plugin (still use sriov-cni for interface configurations, such as IP/MAC/spoofCheck/linkAuto etc)?

I'm looking for improvements on the user experience and reduce the complexity of the existing solutions, not familiar with all the SRIOV use cases , but my feeling is that you can define this configurations in one place and avoid spreading the responsibilities over different components ...

I assume it is in align with the network device proposal here: https://github.com/opencontainers/runtime-spec/issues/1239, which provides the capacity of adding a network device to the namespace without explicit or implicit interactions with CNI, but not to configure its networking parameters (e.g. IP allocation and management).

aojea commented 7 months ago

yep, exactly, decoupling interface management of interface configuration I think allow to have much cleaner implementations ... right now CNI is "abused" for "everything networking goes there"

aojea commented 1 month ago

updating the issue https://github.com/containerd/nri/pull/119

CDI devices add some problems that NRI plugins completely resolve, specially the capacity to reconcile the Pods during the plugin restart and the capacity to keep state running as a daemon