kumahq / kuma

šŸ» The multi-zone service mesh for containers, Kubernetes and VMs. Built with Envoy. CNCF Sandbox Project.
https://kuma.io/install
Apache License 2.0
3.61k stars 332 forks source link

Add ability to wait for sidecar container #2483

Closed michaelkoro closed 1 year ago

michaelkoro commented 3 years ago

Summary

When deploying Kong on top of EKS into a kuma-based namespace, we noticed that the db migration job that is executed before the pods are started, is sometimes stuck with the following error -

Error: [PostgreSQL error] failed to retrieve server_version_num: host or service not provided, or not known

After trying a few things (changing the db address, upgrading docker version) we noticed that the envoy proxy sometimes is started after the application container (in this case, the migrations container) which I guess causes network errors.

Once we deployed the chart in a namespaces that wasn't managed by Kuma, everything worked fine. Is there a way to tell kuma to first start the envoy proxy, and only then the application itself ?

Thanks

Kuma Chart version - 0.6.0 EKS - 1.19 Kong - 2.1.4

Additional Details & Logs

link to a related ticket in kong - https://github.com/Kong/kong/issues/4363

michaelkoro commented 3 years ago

Did anyone encounter this kind of issue ?

jpeach commented 3 years ago

xref https://github.com/kubernetes/kubernetes/issues/65502

skaravad commented 3 years ago

The problem is, unless kuma DP is up and running the pod has no network, and as per K8s it appears that sidecar lifecycle is more complicated than it was thought and it is a waiting game.

On the other hand, if you can wrap the main application command or entrypoint, you can use this logic (install netcat in ubuntu or debian, alpine has nc command by default installed)

## Check Network when Service Mesh is enabled
while true
do
  nc -vz www.google.com 443
  ret_code=$?
  if [ $ret_code  -ne 0 ] ; then
    echo "Network Not ready"
    sleep 3
  else
    echo "Network Ready"
  break
  fi
done

echo "starting {{.Chart.Name}} service"
MAIN COMMAND

Btw, if you have vault integration and you have a init container which runs , it will not init , to overcome , just add this annotation vault.hashicorp.com/agent-init-first: "true"

michaelkoro commented 3 years ago

@skaravad I remember when working with Istio that they managed to solve the issue. I think when deploying Istio you had to add a flag which basically tells the app container to wait for the proxy. Are you familiar with that ? Is there a way to implement this solution in kuma as well ?

jpeach commented 3 years ago

xref #2571

skaravad commented 3 years ago

@michaelkoro with ISTIO it was a annotation https://github.com/istio/istio/issues/11130

annotations:
  proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'

But I don't think there was a closure, I think unless K8s has a way to order the containers scheduling in pod , these are just workarounds.

In case of Kuma, it appears that the issue is with only DNS that starts with DP , though you can disable DNS on the DP and use DNS via CP ( @jpeach please correct me if I'm wrong), I think it was not best practice.

michaelkoro commented 3 years ago

@skaravad I actually noticed now that when deploying kong to a kuma-managed namespace, we are getting the following error from the kuma sidecar container, which fails the kong deployment:

Error: could not read file /var/run/secrets/kubernetes.io/serviceaccount/token: stat /var/run/secrets/kubernetes.io/serviceaccount/token: no such file or directory

Which service account is it looking for ?

bartsmykla commented 3 years ago

Ok, I have my guesses. When injecting kuma, at the beginning there is kuma-init init container started, which is installing transparent proxying, which is also redirecting all DNS traffic to kuma-dp DNS server (by default), as the server starts with the envoy in kuma-sidecar container, DNS traffic won't work in the duration between kuma-init will finish and kuma-dp DNS server would start. I'm not sure how to fix this at this point yet, without disabling kuma-dp DNS servers.

jakubdyszkiewicz commented 3 years ago

@michaelkoro we use service account token as authentication mechanism between kuma-dp and kuma-cp.

bartsmykla commented 3 years ago

actually, we discussed it with @jakubdyszkiewicz and it's not even a DNS thing, as all traffic is redirected then, so kuma-dp has to be fully running

michaelkoro commented 3 years ago

@bartsmykla Yea what I ended up doing to avoid the problem was disabling the kuma injection on the kong pre and post migration jobs, just so it could work properly. Not sure why, but the kong pod itself managed to connect to the DB (meaning network was set up), but the pre migration job (which is the same kong image) couldnā€™t.

lahabana commented 2 years ago

Someone also mentionned: https://medium.com/@marko.luksa/delaying-application-start-until-sidecar-is-ready-2ec2d21a7b74

github-actions[bot] commented 2 years ago

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

lahabana commented 2 years ago

There's some research required here as it might not be straight forward.

github-actions[bot] commented 2 years ago

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

alt-dima commented 2 years ago

And the same problem on pod shutdown. sidecar dies faster/first and the main container loses network connection.

michaelkoro commented 2 years ago

@alt-dima We also started experiencing this issue. From time to time when a pod dies, kuma receives the SIGTERM and closes all connections, which causes many "network error" logs from our application, until the application pod is terminated.

github-actions[bot] commented 2 years ago

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

lahabana commented 2 years ago

I believe we've fixed the shutdown issue you are mentioning in the coming release of Kuma @jakubdyszkiewicz can confirm

michaelkoro commented 2 years ago

Release 1.7.0 ?

lahabana commented 2 years ago

Yes releasing early next week

github-actions[bot] commented 2 years ago

This issue was inactive for 30 days it will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant please comment on it promptly or attend the next triage meeting.

lahabana commented 2 years ago

Seems like we need: 1) Make sidecar first in the list of containers 2) Add a PostStart hook on the sidecar that waits for the sidecar to be ready (this could be a http call)

  1. we're always good to make sidecar be the first container (atm it's last and there's no determinism so switching will be fine).
  2. I don't think calling envoy admin is right, we probably want to have this be a combination with the actual DP process.
lahabana commented 2 years ago

@johnharris85 thinks that maybe the order of containers doesn't matter.

github-actions[bot] commented 1 year ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

github-actions[bot] commented 1 year ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.

lahabana commented 1 year ago

xref: https://github.com/kumahq/kuma/issues/6082

github-actions[bot] commented 1 year ago

This issue was inactive for 90 days. It will be reviewed in the next triage meeting and might be closed. If you think this issue is still relevant, please comment on it or attend the next triage meeting.