suse-edge / charts

SUSE Edge engineering Helm charts
https://suse-edge.github.io/charts/
Apache License 2.0
5 stars 18 forks source link

Add SRIOV chart #104

Closed manuelbuil closed 5 months ago

manuelbuil commented 5 months ago

As agreed with several people, we are migrating the sriov chart from rancher/charts into this Github repo. This PR executes that migration.

This PR adds the SRIOV chart (embedding the node-feature-discovery chart too), which is basically an adapted copy/paste of what we currently have under rancher/charts.

The agreed way forward is to have both repos aligned in versions for a bit, that's why I did not updated the versions of this repo. I think it is best if we do it in a following PR.

Once this PR is merged, I'll add a deprecation warning to the sriov chart in rancher/charts.

I did some basic testing and it seems to work but if anyone could also do it, that'd be great! Thanks

e-minguez commented 5 months ago

Hi Manuel, thanks for the PR. I don't have the details on moving the chart here but just in case, we have a Makefile at root level where one of the tasks is to generate a small html to show the different charts we host here in a friendly way, you can see it live here https://suse-edge.github.io/charts/. Would you mind running that and adding those changes to this PR as well? Thanks!

manuelbuil commented 5 months ago

Hi Manuel, thanks for the PR. I don't have the details on moving the chart here but just in case, we have a Makefile at root level where one of the tasks is to generate a small html to show the different charts we host here in a friendly way, you can see it live here https://suse-edge.github.io/charts/. Would you mind running that and adding those changes to this PR as well? Thanks!

I ran make chrats and then make validate, what target should I run? make index?

manuelbuil commented 5 months ago

Ok, I ran make charts but forgot to push the output :facepalm:

e-minguez commented 5 months ago

Hi Manuel, thanks for the PR. I don't have the details on moving the chart here but just in case, we have a Makefile at root level where one of the tasks is to generate a small html to show the different charts we host here in a friendly way, you can see it live here https://suse-edge.github.io/charts/. Would you mind running that and adding those changes to this PR as well? Thanks!

I ran make chrats and then make validate, what target should I run? make index?

make html

See https://github.com/suse-edge/charts/blob/main/Makefile#L13

manuelbuil commented 5 months ago

Done! Any idea why the generated chart adds the suffix +up0.1.0 to the version (also to the asset's tarball)?

manuelbuil commented 5 months ago

Found it: https://github.com/rancher/charts-build-scripts/blob/b88735499d4045cf58e53a0f13db043154b889e5/templates/template/docs/developing.md?plain=1#L116, the problem is that sriov upstream is not moving the version of the chart at all, so we would always have that annoying +up0.1.0, if it's fine by you guys, then it's fine by me

hardys commented 5 months ago

Found it: https://github.com/rancher/charts-build-scripts/blob/b88735499d4045cf58e53a0f13db043154b889e5/templates/template/docs/developing.md?plain=1#L116, the problem is that sriov upstream is not moving the version of the chart at all, so we would always have that annoying +up0.1.0, if it's fine by you guys, then it's fine by me

I think this is expected when using the rancher charts-build-scripts and referencing an upstream version, however in packages/sriov-operator/package.yaml the comment says it's upstream 1.2.0 so the +up0.1.0 is wrong?

See the rancher/charts docs on this topic which I think say it should be <downstream version>+up<upstream version>?

alknopfler commented 5 months ago

Thanks @manuelbuil for your contribution!!! I will test the chart in the lab to ensure it's working fine

manuelbuil commented 5 months ago

Found it: https://github.com/rancher/charts-build-scripts/blob/b88735499d4045cf58e53a0f13db043154b889e5/templates/template/docs/developing.md?plain=1#L116, the problem is that sriov upstream is not moving the version of the chart at all, so we would always have that annoying +up0.1.0, if it's fine by you guys, then it's fine by me

I think this is expected when using the rancher charts-build-scripts and referencing an upstream version, however in packages/sriov-operator/package.yaml the comment says it's upstream 1.2.0 so the +up0.1.0 is wrong?

See the rancher/charts docs on this topic which I think say it should be <downstream version>+up<upstream version>?

The thing is that upstream is using the variable appversion while the script checks version. That's the reason.

I actually saw a variable in the code that allows you to ignore that and remove the +up0.1.0 but I did not see how to operate that variable from any API haha

manuelbuil commented 5 months ago

Thanks @manuelbuil for your contribution!!! I will test the chart in the lab to ensure it's working fine

Thanks!

This is my plan: 1 - Get this PR merged 2 - Add your CM patch for extra NICs + update the images (reducing CVEs and updating code) 3 - Consume it for a while to find non-obvious errors (let's give it a couple of weeks) 4 - Deprecate SRIOV in charts/rancher and make this one the official

alknopfler commented 5 months ago

Tested on the lab:

localhost:~ # dpdk-devbind.py -s

Network devices using DPDK-compatible driver
============================================
0000:51:01.0 'Ethernet Adaptive Virtual Function 1889' drv=vfio-pci unused=iavf
0000:51:01.1 'Ethernet Adaptive Virtual Function 1889' drv=vfio-pci unused=iavf
0000:51:11.0 'Ethernet Adaptive Virtual Function 1889' drv=vfio-pci unused=iavf
0000:51:11.1 'Ethernet Adaptive Virtual Function 1889' drv=vfio-pci unused=iavf

and pods successfully deployed:

kube-system       sriov-device-plugin-8j8k6                                  1/1     Running   0               103s
kube-system       sriov-network-config-daemon-hlqrb                          3/3     Running   0               7m41s
kube-system       sriov-network-operator-5d6699c84d-jklgt                    1/1     Running   0               103s
kube-system       sriov-network-operator-sriov-nfd-gc-99664d86f-kvt8x        1/1     Running   0               103s
kube-system       sriov-network-operator-sriov-nfd-master-649fd66f78-8rhp5   1/1     Running   0               103s
kube-system       sriov-network-operator-sriov-nfd-worker-6bph5              1/1     Running   0               7m50s
hardys commented 5 months ago

Found it: https://github.com/rancher/charts-build-scripts/blob/b88735499d4045cf58e53a0f13db043154b889e5/templates/template/docs/developing.md?plain=1#L116, the problem is that sriov upstream is not moving the version of the chart at all, so we would always have that annoying +up0.1.0, if it's fine by you guys, then it's fine by me

I think this is expected when using the rancher charts-build-scripts and referencing an upstream version, however in packages/sriov-operator/package.yaml the comment says it's upstream 1.2.0 so the +up0.1.0 is wrong? See the rancher/charts docs on this topic which I think say it should be <downstream version>+up<upstream version>?

The thing is that upstream is using the variable appversion while the script checks version. That's the reason.

I actually saw a variable in the code that allows you to ignore that and remove the +up0.1.0 but I did not see how to operate that variable from any API haha

Ack yes I see now the upstream chart version is 0.1.0 although the upstream appVersion is 1.2.0 - I also see this is consistent with the current rancher/charts version