vmware-archive / kubeless

Kubernetes Native Serverless Framework
https://kubeless.io
Apache License 2.0
6.86k stars 755 forks source link

Separate Monitoring and health endpoints from function endpoint #760

Open joek opened 6 years ago

joek commented 6 years ago

FEATURE REQUEST: Currently /healthz and /metrics are served on the same port as the function code. This is not optimal for different reasons.

  1. This might create conflicts with function code, if the user doing some path based operations inside the function.
  2. Istio with mutualTLS enabled will not work, as livenessprobe is failing and promethues can not get pod data.
  3. Security People don't like it, as it is exposing data, which might provide insights

What you expected to happen: Let's put /metrics and /healthz on a dedicated port. We can exclude this port from istio and do not have to expose.

Environment:

If this is fine for you, we can start working on it at the 28th. I am sure, we can make the change compatible to the current implementation, so it will not stop the 1.0 Release.

andresmgot commented 6 years ago

Thanks @joek. I agree with the security concern about the /metrics endpoint but I don't fully understand the issue with the /healthz endpoint: Could you elaborate the issue? Why the mutual TLS works for the function but not for the health check endpoint?

Maybe an alternative solution for that would be to add the option to disable the health check instead.

joek commented 6 years ago

The liveness probe is done outside of the container. We could change the probe to curl and do it on localhost. This should work, but we still got the problem with potential collisions with function logic.

https://istio.io/help/faq/security.html#k8s-health-checks

andresmgot commented 6 years ago

what I am not sure I understand is why the health check would work if served in a different port. Does Istio mutual TLS only work in a specific port? Also it seems that they working on fixing the issue. My concern about serving the health check in a different port is that some runtimes may serve different ports (servers) in different threads. Reporting one of them as healthy might not mean that the one serving the real requests is.

Also we cannot do the workaround of using curl since some of the runtimes doesn't have curl installed.

My suggestion is to move the /metrics to a different port like you suggest and make the liveness probe optional so you can disable it when using Istio. What do you think?

Also let's bring @murali-reddy to the discussion, I want to know his opinion as well.

murali-reddy commented 6 years ago

https://github.com/istio/istio/issues/2628 has more details

We should stick to best practice or prescribed way of Kubernetes. I feel Istio folks should figure a way forward for interop with Kubernetes deployments. IMO we should wait and see the reolution on Istio for this issue before doing such a drastic change to Kubeless.

This might create conflicts with function code, if the user doing some path based operations inside the function.

This is relevent only in case of http triggers right? Should not user be supposed to do proper path to function at http triggers level.

andresmgot commented 6 years ago

From that issue:

It is likely to be ready in 0.8.1 release, which I expect to happen in a month.

So let's wait for them to show their solution.

alexellis commented 6 years ago

I work on OpenFaaS and have an interest in this issue/topic too. What is the benefit in adding a second HTTP port for binding health/metrics endpoints?

This might create conflicts with function code, if the user doing some path based operations inside the function.

I was talking to Joe Beda about checkpoint/metric endpoints - in part of the project we've implemented a prefix for these endpoints i.e. /_/metrics and /_/health - have you considered that as a solution to resolve this point?

joek commented 6 years ago

@murali-reddy Actually the problem occurs on all deployments, as mutualTLS is securing the service to service communication too (including communication between triggers and functions).

@andresmgot we can make the livenessprobe configurable via configmap to have a workaround in place as long as istio is not delivering the feature. (I don' trust them, istio 0.8 is still not in alpha). We should consider changing the path like @alexellis suggested.

For the metrics endpoint, we still have to have it on a dedicated port as we have to exclude this particular port to be intercepted by the istio sidecar.

andresmgot commented 6 years ago

I agree that the quickest solution that will unblock you @joek is having a field in the kubeless-config to disable healthchecks.

If I understood correctly, the /metrics endpoint is not blocking you since you can still do the request there using a proper certificate, isn't it?. Note that you should be able to also limit the access to that resource from outside the cluster with the Ingress controller. We can discuss a proper solution for that in a different issue but since it require changes in all the runtimes it will take longer.

joek commented 6 years ago

I don't think we should switch off the healthchecks, we should have a option, to overwrite them in the deployment template. This will add more flexibility.

Regarding the metrics: I will check with the team one more option, which got into my mind. Fixing it in all runtimes should not be a problem, we can do it. I will open a ticket for the metrics, as soon as we checked out this option.