Open klueska opened 3 months ago
/sig node
FYI, here is the CNI result type.
A typical result looks something like:
{
"ips": [
{
"address": "10.1.0.5/16",
"gateway": "10.1.0.1",
"interface": 2
},
{
"address": "2001:db8::42/64",
"gateway": "2001:db8::1",
"interface": 2
}
],
"routes": [
{
"dst": "0.0.0.0/0"
},
{
"dst": "::/0"
}
],
"interfaces": [
{
"name": "cni0",
"mac": "00:11:22:33:44:55"
},
{
"name": "veth3243",
"mac": "55:44:33:22:11:11"
},
{
"name": "eth0",
"mac": "99:88:77:66:55:44",
"sandbox": "/var/run/netns/blue"
}
]
}
The CNI result type is mostly not exposed to k8s end-users. That said, IP addresses of the PodSandbox would be the bare minimum.
Note that IP addresses are, of course, mutable; if a node reboots, the PodSandbox will be recreated and new IPs allocated.
I was more leaning towards standardize by convention and not via Kubernetes API fields, think that it will be very complex on reach an agreement and support all the use cases, and also very difficult to iterate if everytime we need to go through API review. Adding a new field to routes or interfaces per example, will require to go through all the process, and there will be always niche use cases for some protocols
I was thinking using the CNI result types as foundation, but there are fields we don't want on the API for users, sandbox
in the example.
I'm proposing to add a new field unstructured to Claim Status (right now the Configuration is also unstructured) where DRA drivers can write the results of each allocation, and use the CNI types to create a convention object we can share across networking DRA drivers
// DeviceConfiguration must have exactly one field set. It gets embedded
// inline in some other structs which have other fields, so field names must
// not conflict with those.
type DeviceConfiguration struct {
// Opaque provides driver-specific configuration parameters.
//
// +optional
// +oneOf=ConfigurationType
Opaque *OpaqueDeviceConfiguration `json:"opaque,omitempty" protobuf:"bytes,1,opt,name=opaque"`
}
cc: @danwinship
I made a PoC about the ResourceClaim status for networking: https://gist.github.com/LionelJouin/5cfc11eecf73663b5657ed3be1eb6c00
The AllocatedDeviceStatus
has been extended to contain two new fields:
DeviceInfo
: A field accepting any kind of data like the opaque parameters (.spec.devices.config.opaque.parameters).NetworkDeviceInfo
: A field only for the network devices.For reference for the NetworkDeviceInfo
field, Multus works with annotations and reports the status in a special annotation. Here is the API it uses for it: https://github.com/k8snetworkplumbingwg/network-attachment-definition-client/blob/v1.7.1/pkg/apis/k8s.cni.cncf.io/v1/types.go#L103
This PoC uses CNI, the result is parsed and the IPs, Interface Name and Mac address is added to the NetworkDeviceInfo
field and the full CNI result is added to the DeviceInfo
field.
Here are the changes to the API:
// AllocatedDeviceStatus contains the status of an allocated device, if the
// driver chooses to report it. This may include driver-specific information.
type AllocatedDeviceStatus struct {
...
// DeviceInfo contains Arbitrary driver-specific data.
//
// +optional
DeviceInfo runtime.RawExtension `json:"deviceInfo,omitempty" protobuf:"bytes,6,rep,name=deviceInfo"`
// NetworkDeviceInfo contains network-related information specific to the device.
//
// +optional
NetworkDeviceInfo NetworkDeviceInfo `json:"networkDeviceInfo,omitempty" protobuf:"bytes,7,rep,name=networkDeviceInfo"`
}
// NetworkDeviceInfo provides network-related details for the allocated device.
// This information may be filled by drivers or other components to configure
// or identify the device within a network context.
type NetworkDeviceInfo struct {
// Interface specifies the name of the network interface associated with
// the allocated device. This might be the name of a physical or virtual
// network interface.
//
// +optional
Interface string `json:"interface,omitempty" protobuf:"bytes,1,rep,name=interface"`
// IPs lists the IP addresses assigned to the device's network interface.
// This can include both IPv4 and IPv6 addresses.
//
// +optional
IPs []string `json:"ips,omitempty" protobuf:"bytes,2,rep,name=ips"`
// Mac represents the MAC address of the device's network interface.
//
// +optional
Mac string `json:"mac,omitempty" protobuf:"bytes,3,rep,name=mac"`
}
Here is an example of the final ResourceClaim after running the demo of this PoC:
apiVersion: resource.k8s.io/v1alpha3
kind: ResourceClaim
metadata:
name: macvlan-eth0-attachment
spec:
devices:
config:
- opaque:
driver: poc.dra.networking
parameters:
config: '{ "cniVersion": "1.0.0", "name": "macvlan-eth0", "plugins": [ {
"type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type":
"host-local", "ranges": [ [ { "subnet": "10.10.1.0/24" } ] ] } } ] }'
interface: net1
requests:
- macvlan-eth0
requests:
- allocationMode: ExactCount
count: 1
deviceClassName: cni-v1
name: macvlan-eth0
status:
allocation:
devices:
config:
- opaque:
driver: poc.dra.networking
parameters:
config: '{ "cniVersion": "1.0.0", "name": "macvlan-eth0", "plugins": [
{ "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type":
"host-local", "ranges": [ [ { "subnet": "10.10.1.0/24" } ] ] } } ] }'
interface: net1
requests:
- macvlan-eth0
source: FromClaim
results:
- device: cni
driver: poc.dra.networking
pool: kind-worker
request: macvlan-eth0
nodeSelector:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- kind-worker
deviceStatuses:
- conditions: null
device: cni
deviceInfo:
cniVersion: 1.0.0
interfaces:
- mac: 1e:32:6c:b7:c9:66
name: net1
sandbox: /var/run/netns/cni-5b7c0846-7995-9450-f441-a177399d08d5
ips:
- address: 10.10.1.2/24
gateway: 10.10.1.1
interface: 0
driver: poc.dra.networking
networkDeviceInfo:
interface: net1
ips:
- 10.10.1.2/24
mac: 1e:32:6c:b7:c9:66
pool: kind-worker
request: macvlan-eth0
reservedFor:
- name: demo-a
resource: pods
uid: 2bd46adf-b478-4e25-9e37-828539799169
The AllocatedDeviceStatus has been extended to contain two new fields:
Not just that, the entire claim.status.deviceStatuses
field is new. Who writes that field? I see in https://github.com/LionelJouin/kubernetes/commits/dra-device-status/ that you grant kubelet permission to write the claim status, but not the actual code.
The direction from SIG Auth is to move away from having kubelet publish information on behalf of node agents. It's better to authorize those agents and let them do the writes themselves. That enables version skew (kubelet from 1.31, control plane from 1.32 with a different API) and keeps kubelet out of the authorization path.
I don't have a strong opinion about what information should be published. API reviewers will want a strong justification for using a RawExtension. But a strongly typed NetworkDeviceInfo
struct isn't an easy sell either unless there's consensus about which information should be standardized like that.
- opaque:
driver: poc.dra.networking
parameters:
config: '{ "cniVersion": "1.0.0", "name": "macvlan-eth0", "plugins": [
{ "type": "macvlan", "master": "eth0", "mode": "bridge", "ipam": { "type":
"host-local", "ranges": [ [ { "subnet": "10.10.1.0/24" } ] ] } } ] }'
interface: net1
Why is config
a string? Wouldn't this be better:
- opaque:
driver: poc.dra.networking
parameters:
config:
cniVersion: "1.0.0"
name: "macvlan-eth0
...
interface: net1
There's no kind and version in the parameters
content. You will have to think about versioning those parameters when going forward with standardizing them.
Oops, that's correct, my bad, sorry, here it the API change:
// ResourceClaimStatus tracks whether the resource has been allocated and what
// the result of that was.
type ResourceClaimStatus struct {
...
// DeviceStatuses contains the status of each device allocated for this
// claim, as reported by the driver. This can include driver-specific
// information. Entries are owned by their respective drivers.
//
// +optional
// +listType=map
// +listMapKey=devicePoolName
// +listMapKey=deviceName
DeviceStatuses []AllocatedDeviceStatus `json:"deviceStatuses,omitempty" protobuf:"bytes,4,opt,name=deviceStatuses"`
}
// AllocatedDeviceStatus contains the status of an allocated device, if the
// driver chooses to report it. This may include driver-specific information.
type AllocatedDeviceStatus struct {
// Request is the name of the request in the claim which caused this
// device to be allocated. Multiple devices may have been allocated
// per request.
//
// +required
Request string `json:"request" protobuf:"bytes,1,rep,name=request"`
// Driver specifies the name of the DRA driver whose kubelet
// plugin should be invoked to process the allocation once the claim is
// needed on a node.
//
// Must be a DNS subdomain and should end with a DNS domain owned by the
// vendor of the driver.
//
// +required
Driver string `json:"driver" protobuf:"bytes,2,rep,name=driver"`
// This name together with the driver name and the device name field
// identify which device was allocated (`<driver name>/<pool name>/<device name>`).
//
// Must not be longer than 253 characters and may contain one or more
// DNS sub-domains separated by slashes.
//
// +required
Pool string `json:"pool" protobuf:"bytes,3,rep,name=pool"`
// Device references one device instance via its name in the driver's
// resource pool. It must be a DNS label.
//
// +required
Device string `json:"device" protobuf:"bytes,4,rep,name=device"`
// Conditions contains the latest observation of the device's state.
// If the device has been configured according to the class and claim
// config references, the `Ready` condition should be True.
//
// +optional
// +listType=atomic
Conditions []metav1.Condition `json:"conditions" protobuf:"bytes,5,rep,name=conditions"`
// DeviceInfo contains Arbitrary driver-specific data.
//
// +optional
DeviceInfo runtime.RawExtension `json:"deviceInfo,omitempty" protobuf:"bytes,6,rep,name=deviceInfo"`
// NetworkDeviceInfo contains network-related information specific to the device.
//
// +optional
NetworkDeviceInfo NetworkDeviceInfo `json:"networkDeviceInfo,omitempty" protobuf:"bytes,7,rep,name=networkDeviceInfo"`
}
// NetworkDeviceInfo provides network-related details for the allocated device.
// This information may be filled by drivers or other components to configure
// or identify the device within a network context.
type NetworkDeviceInfo struct {
// Interface specifies the name of the network interface associated with
// the allocated device. This might be the name of a physical or virtual
// network interface.
//
// +optional
Interface string `json:"interface,omitempty" protobuf:"bytes,1,rep,name=interface"`
// IPs lists the IP addresses assigned to the device's network interface.
// This can include both IPv4 and IPv6 addresses.
//
// +optional
IPs []string `json:"ips,omitempty" protobuf:"bytes,2,rep,name=ips"`
// Mac represents the MAC address of the device's network interface.
//
// +optional
Mac string `json:"mac,omitempty" protobuf:"bytes,3,rep,name=mac"`
}
The ResourceClaim Status is set via the Kubernetes API by the component calling the CNI add. The flow is the following:
In this PoC, The container runtime acts as DRA-Driver (I mixed several PoCs together), so it calls the CNIs and sets the status via the Kubernetes API. I used the kubelet kubeconfig to access the Kubernetes API (as for simplicity since Containerd was not using the Kubernetes API before).
Yes, that's right, the config could be in better format, for this PoC I mainly focused on the ResourceClaim Status and DRA-Driver in Containerd.
I used the kubelet kubeconfig to access the Kubernetes API (as for simplicity since Containerd was not using the Kubernetes API before).
So it wasn't kubelet itself, thus avoiding the version skew problem there at the cost moving it into containerd. How to secure that write will be tricky.
@LionelJouin this is great to see. Antonio, Tim, and I put together some thoughts on the API options (open to dev@kubernetes.io):
https://docs.google.com/document/d/1smjEm7WxFm8btQwywdr_F5BZkiifatNqDvrQUaesu1M/edit?usp=sharing
Personally, I like Option 5 - real, built-in API.
If code snippets help, I can add those. Once we decide on the rough API, I can start the KEP. We'll need to figure out the authorization bit: this needs to be writeable by the drivers, but only for claims on the same node as the driver instance. @pohly I know you have looked a lot at the node authorizer, do you think we can use that here?
/assign @aojea
@pohly I know you have looked a lot at the node authorizer, do you think we can use that here?
Better ask SIG Auth. The way I remember it, the answer is currently "no, node authorizer is only for kubelet, not agents running on a node". But I think there were plans to change that, which might be sufficient for the this issue (start as alpha without locking down access, make it a requirement for beta that it gets locked down through some SIG Auth KEP).
/milestone v1.32 /label lead-opted-in
/assign @LionelJouin
Lionel is getting a PR ready.
Hello @klueska @pohly @johnbelamaric @thockin @aojea 👋, Enhancements team here.
Just checking in as we approach enhancements freeze on 02:00 UTC Friday 11th October 2024 / 19:00 PDT Thursday 10th October 2024.
This enhancement is targeting for stage alpha
for v1.32 (correct me, if otherwise)
Here's where this enhancement currently stands:
implementable
for latest-milestone: v1.32
.For this KEP, we would just need to update the following:
status
must be set to implementable
and latest-milestone
must be set to v1.32
The status of this enhancement is marked as at risk for enhancement freeze
. Please keep the issue description up-to-date with appropriate stages as well.
If you anticipate missing enhancements freeze, you can file an exception request in advance. Thank you!
1.32 Enhancements team here. I see that the KEP and PRR have been merged, so I've updated the status to tracked for enhancements freeze
!
Hi @klueska @pohly @johnbelamaric @thockin @aojea 👋, 1.32 Release Docs Lead here.
Does this enhancement work planned for 1.32 require any new docs or modification to existing docs?
If so, please follows the steps here to open a PR against dev-1.32 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday October 24th 2024 18:00 PDT.
Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.
Thank you!
Hey hey @LionelJouin @aojea @johnbelamaric 👋 from the v1.32 Communications Team!
We'd love for you to consider writing a feature blog about your enhancement. Some reasons why you might want to write a blog for this feature include (but are not limited to) if this introduces breaking changes, is important to our users, or has been in progress for a long time and it is graduating.
To opt-in, let us know by opening a Feature Blog placeholder PR against the website repository by 30th Oct 2024. For more information about writing a blog see the blog contribution guidelines.
Note: In your placeholder PR, use XX
characters for the blog date
in the front matter and file name. We will work with you on updating the PR with the publication date once we finalize the blog schedule.
Hi @klueska @pohly @johnbelamaric @thockin @aojea 👋, 1.32 Release Docs Lead here.
Does this enhancement work planned for 1.32 require any new docs or modification to existing docs?
If so, please follows the steps here to open a PR against dev-1.32 branch in the k/website repo. This PR can be just a placeholder at this time and must be created before Thursday October 24th 2024 18:00 PDT.
Also, take a look at Documenting for a release to get yourself familiarize with the docs requirement for the release.
Thank you!
Hi @klueska @pohly @johnbelamaric @thockin @aojea 👋,
Just a reminder to open a placeholder PR against dev-1.32 branch in the k/website repo for this (steps available here). The deadline for this is a week away at Thursday October 24, 2024 18:00 PDT.
Thanks,
Daniel
Hey again @klueska @pohly @johnbelamaric @thockin @aojea 👋, v1.32 Enhancements team here.
Just checking in as we approach code freeze at 02:00 UTC Friday 8th November 2024 / 19:00 PDT Thursday 7th November 2024.
Here's where this enhancement currently stands:
approved
and lgtm
labels applied) by the code freeze deadline. This includes tests.For this enhancement, it looks like the following PRs need to be merged before code freeze (and we need to update the Issue description to include all the related PRs of this KEP):
If you anticipate missing code freeze, you can file an exception request in advance.
Also, please let me know if there are other PRs in k/k we should be tracking for this KEP. As always, we are here to help if any questions come up. Thanks!
Hello @klueska @pohly @johnbelamaric @thockin @aojea 👋, v1.32 Enhancements team here.
With all the implementation (code related) PRs merged as per the issue description:
This enhancement is now marked as tracked for code freeze
for the v1.32 Code Freeze!
/wg device-management
Enhancement Description
One-line enhancement description (can be used as a release note): Add driver-owned fields in ResourceClaim.Status with possible standardized network interface data
Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/4817-resource-claim-device-status/README.md
Discussion Link:
Primary contact (assignee): @klueska, @pohly, @johnbelamaric, @thockin, @aojea
Responsible SIGs: /sig node
Enhancement target (which target equals to which milestone):
[ ] Alpha
k/enhancements
) update PR(s):k/k
) update PR(s):k/website
) update PR(s):