Open kevinpollet opened 4 years ago
In the past there was a poor mans "audit" that ran on a timer to verify all services were matched up, and that if there was a missing shadow service, trigger it as a new create event.
Did we want to go this route?
I think doing this check at startup is enough. Maybe that's not necessary if the shadow service manager is idempotent (#562). This will ensure that orphan services will not disrupt the system.
One problem with the orphan shadow services is that they hold entry points for nothing.
Just a thought, maybe we could use ownerReferences to make sure that shadow service are properly deleted. wdyt?
oooo That is a neat idea. Letting kubernetes handle it automatically? I like it.
Yep, that's the idea. If a shadow service has a user service as ownerReferences
, it should be garbage collected when the user service
is removed but, we still have to keep the TCP and UDP tables in sync to avoid port leaks. Maybe it will help us when the TCP and UDP tables will be computed at startup.
Btw, I think that we should try to handle orphan shadow services gracefully to avoid port leaks which will disrupt the system.
@kevinpollet I did a test with the ownerReferences
property. It works great but there's a side effect. We are relying on service deletion events to update state tables but when we receive this event the shadow service can already be deleted. And so, we don't know which mapping should be removed. There are solutions though:
RemoveAll
method which takes a name and a namespace as parameter. It will then remove all port mapping for the given service. And so, we no longer need to inspect the shadow service to know which port should be unmapped.The first solution is the easiest to implement and the less disruptive. The second adds one more moving piece in the system and will introduce some delay. The third solution simplify the port mapping management but has a cost. We would have to build this table before each build of the topology.
I tend to prefer the 3rd option or eventually the 1st. @kevinpollet, @dtomcej Any opinion on this?
@jspdown I think that if we use references, we lose a bit of control over the deletion events. We have to just "trust" that things will be handled properly by k8s. We would have to run an "audit" loop to verify that k8s has in fact, removed the objects it was supposed to.
However, isn't the goal to remove the persistent state table entirely, and use the shadow services as the source of truth?
Using ownerReferences
is not an option for us. Shadow services and user services are in different namespaces. The kubernetes documentation on garbage collection says:
Cross-namespace owner references are disallowed by design. This means:
- Namespace-scoped dependents can only specify owners in the same namespace, and owners that are cluster-scoped.
- Cluster-scoped dependents can only specify cluster-scoped owners, but not namespace-scoped owners.
https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/
Feature Request
Proposal
To make sure that shadow services exist for
services
created before Maesh installation or when thecontroller
is down, at startup, thecontroller
list the existing services and create the missing shadow services.A nice addition could be to check that all shadow services have a corresponding service to check that there are no orphans. e.g. when a service is deleted during
controller
downtime.