k8snetworkplumbingwg / sriov-cni

DPDK & SR-IOV CNI plugin
Apache License 2.0
307 stars 146 forks source link

Fix the cache save #298

Closed SchSeba closed 4 months ago

SchSeba commented 4 months ago

before we send a pointer to the netConf object so the marshal function didn't parse all the information needed

SchSeba commented 4 months ago

Hi @zeeke @Eoghan1232 please review CRITICAL bug in the cni

SchSeba commented 4 months ago

without this PR this is the info saved in the cache folder

cat /var/lib/cni/sriov/ae5482e8bc9691863727dbae943daebf6c0f91263f9e35905cf5b55d4b11c1ea-net1 
{"capabilities":{"ips":true,"mac":true},"cniVersion":"1.0.0","ipam":{"type":"static"},"name":"test-sriov-static-usual","type":"sriov"}

we don't have DeviceID so the cni failed on cmdDel function

coveralls commented 4 months ago

Pull Request Test Coverage Report for Build 9212724446

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/utils.go 2 3 66.67%
pkg/types/types.go 15 25 60.0%
<!-- Total: 17 28 60.71% -->
Totals Coverage Status
Change from base Build 9019301759: 1.4%
Covered Lines: 662
Relevant Lines: 1300

💛 - Coveralls
mlguerrero12 commented 4 months ago

what changed? The marshal of an interface that holds a pointer is the value that the pointer points to. This has been here for a while. What am I missing?

SchSeba commented 4 months ago

Hi @mlguerrero12 thanks for the comment!

I don't know exactly. I am trying to understand I was able to find that the MTU PR was the one that introduced the change

SchSeba commented 4 months ago

/hold

SchSeba commented 4 months ago

Hi @mlguerrero12 I found the issue. the bump of the CNI package.

the CNI package overrides the default marshal function

https://github.com/containernetworking/cni/blob/main/pkg/types/types.go#L86

/hold cancel

mlguerrero12 commented 4 months ago

Good catch @SchSeba. I suggest to change the signature of SaveNetConf to func SaveNetConf(cid, dataDir, podIfName string, conf types.NetConf) to make it explicit that it should not be a pointer.

I was going to suggest, as a nit, to ignore the field MTU for marshalling/unmarshalling by adding json:"-" and then I realized that DPDKMode also have it. So, in cmdDel, this is always false even when it is dpdk. I ran a small test for LoadConfFromCache. I haven´t checked the actual logs, but it seems to me that during a delete for dpdk, the first time it fails in ReleaseVF (line 288) and then the next iteration succeeds since we delete the cache and return nil. Only the deletion of the allocator is missing to execute which is not a big problem. Anyway, this is outside of the scope of this pr.

zeeke commented 4 months ago

Good catch @SchSeba. I suggest to change the signature of SaveNetConf to func SaveNetConf(cid, dataDir, podIfName string, conf types.NetConf) to make it explicit that it should not be a pointer.

+1 on changing the signature of SaveNetConf: conf interface{} -> conf types.NetConf

other than that. LGTM

SchSeba commented 4 months ago

Hi @mlguerrero12 @zeeke thanks for the comment please take another look when you have time

SchSeba commented 4 months ago

I just squash the commits waiting for a clean CI and I will merge this one