Closed goncalopcarvalho closed 2 years ago
@yboaron did you have a chance to look into this?
@yboaron did you have a chance to look into this?
Not yet, it's on my to-do list, hope to get to it in the next 1-2 weeks.
Hi @Prophetick , I tried to reproduce this issue on Kind environment but was unable to reproduce it (MetalLB speaker pods restart).
To deploy Submariner (latest devel) + MetalLB (version 0.13.5) on Kind, I downloaded the latest https://github.com/submariner-io/submariner-operator and run : make deploy using=load-balancer,lighthouse
I run both Submariner e2e tests (subctl verify ) and also manual verification while I can see that endpointsslice resource generated by Submariner Lighthouse [1] do not include the kubernetes.io/service-name label but include the multicluster.kubernetes.io/service-name label [1] the MetalLB speaker pods didn't crash/restart.
Is the problem still relevant in the latest stable version of MetalLB? If so, could you please describe how to reproduce it?
[1]
$ kubectl --kubeconfig output/kubeconfigs/kind-config-cluster2 get endpointslice nginx-cluster2 -o yaml addressType: IPv4 apiVersion: discovery.k8s.io/v1 endpoints:
Hi all, thanks for the reply.
When I opened this issue I was using Submariner 0.12.2 and MetalLB 0.12.1.
I can confirm that upgrading Submariner to version 0.13.1 and MetalLB to version 0.13.5 solved the problem for me, I can no longer reproduce this.
Closing as per the last comment.
It does not seem to be solved by Submariner, but ignored silently by MetalLB, does not seem to be solved. Have 0.15.2 Submariner and OCP MetalLB 0.10 The latest OCP MetalLB is 0.11
@Woytek-Polnik EndpointSlices created by Lighthouse are required to have label multicluster.kubernetes.io/service-name
as per the KEP-1645. kubernetes.io/service-name
is for endpointslices created by kubernetes for local services. Different labels are to distinguish between lcoal endpointslices and Exported service's EndpointSlices.
This looks like a bug in MetalLB. It needs to be Multicluster aware. Assumption that all EndpointSlices must have kubernetes.io/service-name
is incorrect. If MetalLB is only interested in local services and hteir EndpointSlices, it should ignore EndpointSlices with the multicluster label. If it is interested in multicluster Endpoint slices, it should honor the multicluster label and use that.
@vthapar Thank you for the extra context. Totally makes sense to me to just ignore them. In that case, I have to push the MetalLB operator in OCP to be ⬆️
I landed on this thread by chance following the comment to https://github.com/metallb/metallb/issues/1175
MetalLB is not multicluster aware, and crashing in that scenario is certainly not the solution. We can make it more robust (and it was already done, incidentally?) I am not sure what the behaviour should be because I need to read the KEP and the related CNI implications (i.e. what happens if the traffic directed to a LB lands to a node that belongs to a different cluster than the cluster the service is defined on, if the service is mirrored to all the clusters, etc) so I'd split the fix in two: avoid crashing (if we still do) and ignore those ep slices as a bug fix, and managing the multi cluster scenario as an enhancement (if it makes sense).
This is from community metallb point of view. @Woytek-Polnik, if you are eligible for support on openshift please reach RH through the proper channels.
What happened: The EndpointSlices resources created by submariner are making metallb speaker pods crash during runtime. This behaviour is due to the EndpointSlices missing a service name label, and it can be manually fixed by adding the label to the EndpointSlice resource in question. All the verifications mentioned in the documentation succeeded.
What you expected to happen: Using the
subctl verify
command, for example, leads to metallb speaker pods crash during the performed tests. I expect this behaviour to not happen due to created EndpointSlices resources missing akubernetes.io/service-name
label. The following logs are from the metallb speaker pod after this crash.How to reproduce it (as minimally and precisely as possible):
subctl verify
in the kubeadm cluster which leads to metallb speaker pods to crash.Anything else we need to know?: This same issue has been mentioned in https://github.com/metallb/metallb/issues/1175 and in https://github.com/submariner-io/submariner/issues/1869.
Environment: We're using submariner to connect three clusters, two K3s single node cluster and one kubeadm single node cluster. We consider the K3s clusters as
cluster-a
andcluster-b
, and the kubeadm cluster ascluster-c
.cluster-a
is where the submariner broker is deployed. All the problems mentioned on this post are happening oncluster-c
.subctl diagnose all
):Gather information (use
subctl gather
):Skipping inter-cluster firewall check as it requires two kubeconfigs. Please run "subctl diagnose firewall inter-cluster" command manually.