Closed vielmetti closed 2 years ago
@vielmetti Thanks for the announcement! Will this somehow affect n2.xlarge? Because as far as I know we use this type.
Thanks @glazychev-art .
The intent is as follows:
NSM will port the software to our new n3 system. The server config is here: https://metal.equinix.com/product/servers/n3-xlarge/ with a notable difference that the NIC is the Intel E810 (with the 'ice' driver)
When you're happy with the port, you'll switch your CI to use n3 systems "on demand" for testing. No need to reconfigure the systems for SR-IOV because that is turned on.
When everything is working to specification, you'll release the old n2 systems and no longer use them.
Hope this makes it more clear!
@vielmetti @edwarnicke Do I understand correctly that NSM will no longer have Reserved servers? And only On Demand servers will be used?
I ran our tests on n3-xlarge today - in general, they are working fine, maybe some minor updates to our scripts will be needed. Testing it.
@glazychev-art The hope is to be able to switch to on demand n3 servers yes :) If that works (and I expect it will) we can then relinquish our reserve instances :)
@edwarnicke There may be some difficulty:
clusterclt
does not see that the server has already been created and continues to wait for it, although it is successfully displayed in the console.equinix. I used version v0.5.0Do we need to move to a cluster-api instead of cloudtest?
I believe this is due to a known issue I've got a fix for in the upcoming 0.6.0 version.
You can try manually editing your cluster yaml to use cloud-provider-equinix-metal version 0.4.3 as well as adding back in the systemctl restart networking
to the line above if [ -f "/run/kubeadm/kubeadm.yaml" ]; then
To see an example, check the commits in this PR: https://github.com/kubernetes-sigs/cluster-api-provider-packet/pull/365/files
@glazychev-art - I checked with our team and they referred me to this patch to cluster-api
that addresses the clusterctl
problem you mention.
https://github.com/kubernetes-sigs/cluster-api-provider-packet/pull/365
The relevent bit is this
Without this PR clusters never come up.
You can manually apply this to a 0.5.0 generated template as a workaround until 0.6.0 comes out.
@cprivitere @vielmetti Thanks guys, I really appreciate your help!
I noticed a strange thing - the last couple of runs are deploying fine even on version 0.5.0. But, of course, if I see it again - I will try what you suggested!
I have a couple more questions to discuss:
clusterctl
we have to install CNI after the control plane node is deployed. Is there a clusterctl
command (something like kubectl wait...
) that will help determine that the control plane is ready for CNI installation? This is for the script.docker
as a container runtime. As far as I understand, this was necessary for setting default limits (we need this for the test) - https://github.com/networkservicemesh/integration-k8s-packet/blob/main/scripts/k8s/config-docker.sh.
Installation via clusterctl
uses a containerd
that does not allow to make such settings. Do I understand correctly that there is no way to use docker
instead of a containerd
?I would be grateful if you have any thoughts on this!
Regarding containerd
and ulimits, these look like the relevant issues:
https://github.com/containerd/nerdctl/issues/344 - ulimits in compose
https://github.com/containerd/nerdctl/pull/370 - ulimits from the nerdctl command line
I don't know the exact syntax to pull these into your configuration, but that should be a good start for seeing what support is already there.
Yeah, all the templates we maintain use containerd, you'd need to edit/override the cloud-init portions of the generated cluster yaml if you wanted to use docker instead of containerd.
For the CNI installation, cluster api has a resource called ClusterResourceSet that can be used to automatically install the CNI in a new cluster. Here's the proposal doc: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20200220-cluster-resource-set.md. We actually have a template that uses a CRS to install Calico that you can check out here: https://github.com/kubernetes-sigs/cluster-api-provider-packet/blob/v0.5.0/templates/cluster-template-crs-cni.yaml.
If you want to try using it, pass --flavor=crs-cni when you do the clusterctl generate command.
@edwarnicke thanks and good to hear that the new systems are up and running.
Can you confirm that the two old n2 systems can be removed? They are currently in red (failed) status in my dashboard, so I know you're not actively using them right now, but I'd like to clean them up.
This is a heads up to let the project know that Equinix Metal is planning to make SR-IOV a default configuration on our new n3 class of servers.
The net effect of this should be that rather than test against a fixed pool of dedicated servers, the project will be able to draw from our server pool. In addition any specific configuration for turning SR-IOV on will be simplified.
The release is forthcoming and there are still a few small details to work out. When the formal announcement drops I'll use this item to update implmentation details for the project.
(I know that you all are in the middle of a release right now so no expectations for any immediate changes until that's taken care of!)
cc @Bolodya1997 @edwarnicke