Closed alexellis closed 4 years ago
Using annotations for this purpose sounds pretty reasonable. It's already supported within the OpenFaaS API, and should only require modification to faas-netes to work with kubernetes - ie looking for a tolerations
key in the annotations map, and adding its value to the deployment spec. This would ensure the OpenFaas API doesn't have to be aware of kubernetes specific tolerations, while still allowing support them via faas-netes.
I chatted with @stefanprodan and @embano1 - we had an idea for a way for users to extend OpenFaaS.
If you use a Mutating Webhook Admission Controller [1] then you can look for your annotation and act upon it as soon as the Pod is requesting creation. You get all the benefits of automation, extending the API in an open-closed way and don't create any engineering burden on the community. The slok [2] project can be used to put this together in a very short period of time.
Stefan suggests you'll need the CA from the cluster:
kubectl get configmap -n kube-system extension-apiserver-authentication -o=jsonpath='{.data.client-ca-file}' | base64 | tr -d '\n'
To sign the TLS key needed for the controller.
This may even be possible to deploy as an OpenFaaS function itself.
[2] https://github.com/slok/kubewebhook
This could be a great example of how to extend OpenFaaS on Kubernetes for others to follow, too.
@alexellis ok, let me give that a try - Admission Controllers are new to me, but kubewebhook looks pretty straightforward. I'll follow up with results here.
Keep us in the loop on it and feel free to ask questions on Kubernetes Slack or in our own #kubernetes channel on OF Slack.
@mercul3s any progress on this? cc @stefanprodan
@alexellis Sorry for the late reply, I've been afk and out of the country the past week and a half. I did not have any luck getting an admissions controller working using kubewebhook, as I ran into a bug using their example code (panics on error handling, from what I can tell). I have some other examples to work from, so I'm going to give it another shot over the next couple of days and hopefully will get something working.
Good to know. I'm exploring how to extend the CLI to accept new arbitrary values and this will pave the way for additional Kubernetes deployment constructs. https://github.com/openfaas/faas-cli/issues/623
Depending on your timeline you could either carry on or help us implement the extension end to end.
@alexellis I like that idea - going to read through and will add comments to that issue. Would be awesome to see some kind of support for tolerations within openfaas, as I would consider mutating controllers to be a workaround rather than full solution for supporting tolerations.
I'd definitely be willing to help implement kubernetes extensions. I'm also going to continue to work on getting a mutating controller up and running for the time being as having some support for tolerations means that we'll actually be able to use Openfaas in production now.
I would consider mutating controllers to be a workaround rather than full solution for supporting tolerations
I agree, it is on the backlog.
I'd definitely be willing to help implement kubernetes extensions
💯
Are you on the Slack workspace yet? It might be worth joining.
-- Join Slack to connect with the community https://docs.openfaas.com
@alexellis yup, I'm on the slack and in the kubernetes channel.
We use dedicated nodes to run different functions. It will be great to see openfaas support Kubernetes Tolerations.
Hi everyone, this is an important feature on the roadmap. From what I understand so far, a generic or one-time toleration for all functions may be suitable. We could apply this to the OpenFaaS Kubernetes controllers at deploy time.
This is also simpler to add for those who feel this is a blocking issue.
Dedicated node pools can be used with Kubernetes constraints in the stack YAML where taints haven't been used.
See also: linked faas-cli issue above on how to map these very complex, leaky structures into the OpenFaaS YAML.
Cc @ibuildthecloud
We at Cognite are also interested in this functionality to run all functions on a node pool with specific settings (preemptible nodes, different resources etc).
There are a couple of people that have mentioned this topic recently. For most people a node label and a constraint may be enough for them. this issue is specifically looking at scheduling to nodes which have a taint on them.
To add, at Redscan Cyber Security we were also looking for this feature. I'm now using the contstraints
to make use of the nodeSelector
directive in k8s. Works well, you essentially:
constraints:
- <key>=<value>
This causes the function pod to deployed to the namespace with a matching label.
Closing in favour of https://github.com/openfaas/faas-netes/issues/586
/lock
Feature: Kubernetes tolerations
https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
Expected Behaviour
Tolerations allow a Pod (function) to be scheduled on a node where there is a "taint" preventing Pods from being scheduled there.
Current Behaviour
We have support for constraints and labelled node-pools for scheduling, but there is a scenario a user came with where a node-pool has a taint and so constraints are not enough - they need to also add a "toleration" to the function Pod.
Possible Solution
We could extend the function schema to allow the Kubernetes tolerations spec to be specified.
This is not available in Swarm and possibly not available in the other back-ends, so it's the first hard requirement for orchestrator-specific knowledge in the OpenFaaS API.
Suggestions are welcome.
Steps to Reproduce (for bugs)
Context
This affects advanced scheduling scenarios on Kubernetes.
Your Environment
FaaS-CLI version ( Full output from:
faas-cli version
):Docker version
docker version
(e.g. Docker 17.0.05 ):Are you using Docker Swarm or Kubernetes (FaaS-netes)?
Operating System and version (e.g. Linux, Windows, MacOS):
Link to your project or a code example to reproduce issue:
Please also follow the troubleshooting guide and paste in any other diagnostic information you have: