Closed nerdalert closed 2 years ago
The aforementioned epic is around uninstalling submariner. By "switch to the ipsec driver" I assume you re-ran subctl join
? If so then that will delete and recreate the Submariner
resource which will invoke the uninstall process in the gateway engine which should include cable driver cleanup.
Hi @tpantelis yeah, switching meant rejoining with a new driver type, or tearing down the entire k8s stack would also result in a ghost interface(s) left over. Looks like you're all over it. Feel free to close the bug or if you want it open to track with your PR. Thanks.
I feel lets keep this open so that we can ensure that we test this use-case (after adding the necessary support). Once we find that its working, we can close it.
The aforementioned epic is around uninstalling submariner. By "switch to the ipsec driver" I assume you re-ran
subctl join
? If so then that will delete and recreate theSubmariner
resource ...
@tpantelis We normally want the subctl join ...
commands to be idempotent. If the user is re-running the command with the same flags/configuration, we should not trigger the cleanup. Hope we are on the same page?
@tpantelis We normally want the
subctl join ...
commands to be idempotent. If the user is re-running the command with the same flags/configuration, we should not trigger the cleanup. Hope we are on the same page?
Ideally we shouldn't. subctl join
does CreateAnew
with a foreground delete and I don't see anywhere it checks if the old and new Submariner
differ so it looks like subctl join
will always recreate the daemonsets (@skitt). This is regardless of the dataplane cleanup being added.
Ideally we shouldn't.
subctl join
doesCreateAnew
with a foreground delete and I don't see anywhere it checks if the old and newSubmariner
differ so it looks likesubctl join
will always recreate the daemonsets (@skitt). This is regardless of the dataplane cleanup being added.
That’s right, we always re-deploy the operator. The other deployments shouldn’t be affected though, and the new operator will re-parent them.
Ideally we shouldn't.
subctl join
doesCreateAnew
with a foreground delete and I don't see anywhere it checks if the old and newSubmariner
differ so it looks likesubctl join
will always recreate the daemonsets (@skitt). This is regardless of the dataplane cleanup being added.That’s right, we always re-deploy the operator. The other deployments shouldn’t be affected though, and the new operator will re-parent them.
I'm not sure I follow. By "other deployments", I assume you mean the gateway, route-agent DaemonSets
et al? subctl join
always ensures the Submariner
CR exists, which does CreateAnew
instead of the usual CreateOrUpdate
so every time it will delete and recreate the Submariner
CR which in turn will delete and recreate the gateway, route-agent DaemonSets
et al, regardless whether or not anything actually changed in the Submariner
CR. Now with the dataplane cleanup coming, it will also run those DaemonSets
. Or am I missing something?
I'm not sure I follow. By "other deployments", I assume you mean the gateway, route-agent
DaemonSets
et al?subctl join
always ensures theSubmariner
CR exists, which doesCreateAnew
instead of the usualCreateOrUpdate
so every time it will delete and recreate theSubmariner
CR which in turn will delete and recreate the gateway, route-agentDaemonSets
et al, regardless whether or not anything actually changed in theSubmariner
CR.
Yes, when subctl join ...
is executed all the DaemonSets get restarted. So the question is, do we want the un-installation
to be triggered as part of this operation or should it be triggered ONLY when explicitly specified by the user?
I'm not sure I follow. By "other deployments", I assume you mean the gateway, route-agent
DaemonSets
et al?subctl join
always ensures theSubmariner
CR exists, which doesCreateAnew
instead of the usualCreateOrUpdate
so every time it will delete and recreate theSubmariner
CR which in turn will delete and recreate the gateway, route-agentDaemonSets
et al, regardless whether or not anything actually changed in theSubmariner
CR. Now with the dataplane cleanup coming, it will also run thoseDaemonSets
. Or am I missing something?
Right, I got things mixed up, and my previous comment was incorrect. The operator doesn’t get redeployed, but everything else does.
Yes, when
subctl join ...
is executed all the DaemonSets get restarted. So the question is, do we want theun-installation
to be triggered as part of this operation or should it be triggered ONLY when explicitly specified by the user?
I'm not sure there's anyway the operator can tell the difference. Besides, I think the bigger issue is that we're re-installing the regular DaemonSets and possibly causing disruption. I think we should have CreateAnew
compare the new and previous and only recreate if changed.
I'm not sure there's anyway the operator can tell the difference. Besides, I think the bigger issue is that we're re-installing the regular DaemonSets and possibly causing disruption.
In some of the components (like gateway, globalnet etc) we don't perform cleanup when the Pod gets restarted as the Pod might be restarted for various reasons and as much as possible, we want to avoid disrupting the data-path. So, we should find a way to instruct the pod explicitly that a cleanup is necessary. Can we do this?
I'm not sure there's anyway the operator can tell the difference. Besides, I think the bigger issue is that we're re-installing the regular DaemonSets and possibly causing disruption.
In some of the components (like gateway, globalnet etc) we don't perform cleanup when the Pod gets restarted as the Pod might be restarted for various reasons and as much as possible, we want to avoid disrupting the data-path. So, we should find a way to instruct the pod explicitly that a cleanup is necessary. Can we do this?
The plan is to create a DaemonSet
that passes an UNINSTALL env var to the pods.
I think we should have
CreateAnew
compare the new and previous and only recreate if changed.
Cool.
@tpantelis @sridhargaddam so are we thinking the this issue will get resolved as part of the uninstall work?
@tpantelis @sridhargaddam so are we thinking the this issue will get resolved as part of the uninstall work?
I think it should as part of the dataplane cleanup - I'll let @sridhargaddam verify.
@sridhargaddam can you comment on this issue? are you going to address it as part of the cleanup work?
Yes @nyechiel, I will look into it as part of dataplane cleanup.
Should be covered by the recent datapath cleanup work, but this needs to be tested.
There seems to be a valid bug here. Related Slack thread: https://kubernetes.slack.com/archives/C010RJV694M/p1646061293066379
When switching between the cable-drivers, submariner-operator is scheduling the uninstall and the new Pods in parallel. Steps to reproduce: In Submariner repo
When switching between the cable-drivers, submariner-operator is scheduling the uninstall and the new Pods in parallel.
The operator waits for the uninstall DeamonSet
to be ready then deletes it so the pods may linger for a bit after the Submariner
deletion completes depending on how long it takes K8s to GC them. I assume this is what you're seeing or are you actually seeing an issue?
The operator waits for the uninstall
DeamonSet
to be ready then deletes it so the pods may linger for a bit after theSubmariner
deletion completes depending on how long it takes K8s to GC them. I assume this is what you're seeing or are you actually seeing an issue?
The uninstall pods are running for ever. If you look at the top-right side of the image, you can see the Age 4m,18sec for Uninstall pods and 4ms,17secs for normal Pod. So the operator has scheduled the regular pods immediately after scheduling the uninstall pods without waiting (or the 2 mins max threshold) for it to complete its execution.
The operator waits for the uninstall
DeamonSet
to be ready then deletes it so the pods may linger for a bit after theSubmariner
deletion completes depending on how long it takes K8s to GC them. I assume this is what you're seeing or are you actually seeing an issue?The uninstall pods are running for ever. If you look at the top-right side of the image, you can see the Age 4m,18sec for Uninstall pods and 4ms,17secs for normal Pod. So the operator has scheduled the regular pods immediately after scheduling the uninstall pods without waiting (or the 2 mins max threshold) for it to complete its execution.
Does the uninstall daemonset still exist? If not, then the pods should be cleaned up by K8s. The operator only removes the finalizer after the uninstall daemonsets are deleted. Until then a new Submariner
instance cannot be created in order to create new regular daemonsets. I suggest looking at the operator log. This process is working correctly and verified in the E2E test so I don't know what make deploy
is doing to cause this strange behavior.
BTW, the uninstall pods don't do anything in the Running
state, ie the cleanup is done in the initializing phase which has to complete before the regular pause
container runs and transitions to the Running
state. But, even so, the pods should eventually terminate and get cleaned up.
Does the uninstall daemonset still exist?
Yes. The uninstall daemonSet are never deleted.
If not, then the pods should be cleaned up by K8s.
Agree, but that's not the case here.
The operator only removes the finalizer after the uninstall daemonsets are deleted.
I tried to spend sometime today to look at the flow and some observations are
instance.GetDeletionTimestamp().IsZero()
is still present, so cleanup code gets triggered once again which is spawning the uninstall DaemonSets. The next invocation of Reconcile (which as I mentioned is happening very frequently) does not see the DeletionTimeStamp on the Submariner Pod so it goes ahead and spawns the regular pods.I'm frequently hitting the docker rate limit while testing this on my host. So, thought of sharing my observations so far.
Until then a new
Submariner
instance cannot be created in order to create new regular daemonsets. I suggest looking at the operator log. This process is working correctly and verified in the E2E test so I don't know whatmake deploy
is doing to cause this strange behavior.
The behavior I described is seen even when we manually run "subctl join ...." on an existing deployment.
BTW, the uninstall pods don't do anything in the
Running
state, ie the cleanup is done in the initializing phase which has to complete before the regularpause
container runs and transitions to theRunning
state.
True, but this is causing some race conditions as pods are spawned with a difference of few milli seconds.
But, even so, the pods should eventually terminate and get cleaned up.
This is not happening even if I leave the setup for an hour ;-)
- removeFinalizer returns success, but in the meantime Reconcile method is invoked again and
instance.GetDeletionTimestamp().IsZero()
is still present, so cleanup code gets triggered once again which is spawning the uninstall DaemonSets. The next invocation of Reconcile (which as I mentioned is happening very frequently) does not see the DeletionTimeStamp on the Submariner Pod so it goes ahead and spawns the regular pods.- I think after we removed the finalizer, it took few milliseconds for K8s to delete the object but in the meantime when the Reconcile method queried the submariner CR, it finds it along with the DeletionTimeStamp on it, so its invoking uninstall once again.
OK thanks - that points to the issue. I think we should not invoke runComponentCleanup
if the finalizer doesn't exist - that should close that timing window. I do have things going on the next couple days on PTO but I can try to push a PR.
OK thanks - that points to the issue. I think we should not invoke
runComponentCleanup
if the finalizer doesn't exist - that should close that timing window. I do have things going on the next couple days on PTO but I can try to push a PR.
Submitted https://github.com/submariner-io/submariner-operator/pull/1899.
Submitted submariner-io/submariner-operator#1899.
I verified couple of scenarios switching the cable-drivers and also re-running the make deploy
with same drivers and confirm that the issue is fixed. Thanks @tpantelis
What happened:
Looks like the vxlan cable driver is binding
vxlan-tunnel
interface to the well known port4500
used for NAT-Treaversal and IKE negation when two peers are natted in an IPSec deployment.The problem this creates is if a user switches from the vxlan cable driver to the ipsec cable driver without rebooting, the
vxlan-tunnel
interface is still bound by the kernel to 4500 and libreswan is not able to bind to the default 4500/udp port.vxlan-tunnel
configuration:How to reproduce it (as minimally and precisely as possible):
Deploy submariner using the vxlan driver, then switch to the ipsec driver without cleaning up
vxlan-tunnel
--nattport=xxx
(e.g. --nattport=4700) in thesubctl join
or by deleting thevxlan-tunnel
interface withsudo ip link del vxlan-tunnel
to unbind 4500/udp.@sridhargaddam mentioned there is an epic around a cleanup from deployments which is awesome. Not sure if this is in the planned cleanup but its worth flagging. Happy to help with that work. cc/ @anfredette Thanks!