Open lilic opened 6 years ago
We could split the above into two separate tasks. First creating just a CustomResource
, used to easily configure the agent and second part of creating a controller from the agent, e.g. replacing the kubectl apply
part with creating each needed resource based on the passed configurations.
Besides the configurations we would also solve the problem of just applying the manifest file every X amount of time
Don't forget we also want to be able to push agent changes out to people's cluster :)
Example manifest file for WeaveCloud
CRD:
apiVersion: cloud.weave.works/v1beta1
kind: WeaveCloud
metadata:
name: weave-cloud-operator
namespace: weave
spec:
general:
environment: gke
auto-update: true
wc-hostname: cloud.weave.works
wc-report-errors: false
wc-poll-url: foo-bar.works
flux:
disable: false
config:
git-label: label
git-url: url
git-path: path
git-branch: branch
scope:
disable: false
read-only: true
container-runtime-endpoint: /var/run/crio.sock
kubernetes-probe: true
prometheus:
disable: false
pod-scrape-policy: optIn
There are some suggestions from Brice in another issue: https://github.com/weaveworks/launcher/issues/145
I would add kind: Alert
to the launcher controller when Weave Cloud will switch to Alertmanager yaml format.
The reasoning behind the alerts as custom resources is here https://docs.google.com/document/d/1t_Ai_tXh7HlJxaxULuyev6KQZ4eWBt7twUw59sTYh1I/edit?usp=sharing
cc @squaremo @errordeveloper
I would add kind: Alert to the launcher controller when Weave Cloud will switch to Alertmanager yaml format.
The reasoning behind the alerts as custom resources is here https://docs.google.com/document/d/1t_Ai_tXh7HlJxaxULuyev6KQZ4eWBt7twUw59sTYh1I/edit?usp=sharing
I think managing alerting config via CRDs is a great idea but we should not plan to include this right now; it's something that can be added to the code later, as a separate piece of work.
Thanks for the example manifest @LiliC!
So the idea is that every option we want to provide is explicitly coded, i.e. we have to know how to change the flux agent manifest to support git-url
?
As a kludge, can we also include something like an "append-args" option to allow people to inject extra flags into an agent's command? My thinking is that it might be very useful in a pinch, when debugging something with a customer, or when trying something out.
We've talked about flux with @LiliC and @squaremo at lunch about flux and the CRD integration. For the first CRD version we will not include anything flux except the disable
toggle.
What is the timing of "at some point."?
If this will ship first we still need to support some flux args so that we have an automatic migration path for people who have configured a git repo, and continue to not trample on people's running config when they're trying to set it up for the first time.
We cannot ship a version which breaks everyone's flux config, or makes it impossible to set up flux with a git repo.
What is the timing of "at some point."?
Unknown, but it doesn't matter too much, because there will necessarily be an indefinite period during which fluxd supports both the command line arguments (if supplied) and the configmap (when mounted).
For the time being, launcher will still be doing what it does today: kubectl apply
while preserving the flux arguments.
For the time being, launcher will still be doing what it does today: kubectl apply while preserving the flux arguments.
Yes agreed, maybe in the future we can convert it to do something more complex like actually creating each individual resource and watching for any changes on those and reconciling. But for now that is sort of too much to change all at once.
cc @stefanprodan
Perhaps this is a dim question, but is there a reason to prefer defining a custom resource over just supplying a config map? E.g., is the intention that a single agent will handle multiple installations of agents in a cluster.
We do have cases where we send data to two WC instances (weave cloud, socks-shop) I was envisioning two CRs for those clusters.
We do have cases where we send data to two WC instances (weave cloud, socks-shop) I was envisioning two CRs for those clusters.
We're the only people that do that; two agents each with its own config would cover that case I think.
To really answer that question we'd need to collect +/- of the two :) eg.
Operators are all going through CRDs, not configmaps. They must be some reasoning here :)
The idea of operators is that they manage arbitrary numbers of resources; e.g., you get one set of agents per weave cloud custom resource. This is where we probably want to end up, but I don't think all our agents are capable to behaving correctly in this scenario (flux isn't, quite), so it's generality we can't use at present.
What will the operator do in response to multiple weave cloud resources?
One plus that I've been looking at is the CRD validation layer so we can get k8s to reject invalid configuration before it reaches the agent
Yep, earlier validation is a point in favour of using a custom resource. It'd be trickier trying to validate a ConfigMap on creation.
We don't need the file mounted on the fs, which is one of the configmap features.
I don't know if this is a point in favour of a CRD -- it's mechanically simpler to watch a file than to watch the kubernetes API.
@squaremo can you explain how the ConfigMap flow would work? e.g. how would we generate things, how would use deploy, update etc.
For example for CRDs user just needs one deployment manifest to deploy the "operator" and one manifest to specify configurations, which they can update later on and the operator would pick up the updates. Like @dlespiau mentioned above, we get things like validations but also versioning of CRDs, meaning we know how to handle different versions of deployed agent operators.
can you explain how the ConfigMap flow would work?
Pretty much the same thing you'd do with a custom resource, except it's in the filesystem rather than something you get through the Kubernetes API. Have I misunderstood something fundamental about what you want to do with the CRD?
For example for CRDs user just needs one deployment manifest to deploy the "operator" and one manifest to specify configurations, which they can update later on and the operator would pick up the updates.
And with a config file, one deployment and one configmap.
we get things like validations but also versioning of CRDs, meaning we know how to handle different versions of deployed agent operators.
I don't think there's anything inherent to versioning of CRDs that makes it better than versioning a config file -- presumably you still have to write code to interpret old versions, for instance. But I haven't tried it, so I'm ignorant of any library support for versioning.
I can think of another +/- point:
I don't think using a custom resource is a bad design; it's more forward-looking than using a config file, in its favour. I was curious why you hadn't just gone for a config file though, so I wanted to see whether that was an adequate alternative.
I don't know if this is a point in favour of a CRD -- it's mechanically simpler to watch a file than to watch the kubernetes API.
Forgot to mention, IIRC and nothing changed since the last time I tried this, watching for the mounted config map file changes is actually not that simple as it's a symlink. Would be easier to just watch the ConfigMap resource with client-go but then it doesn't make any difference from watching a CRD resource.
@LiliC watching for ConfigMaps or Secrets is easy check this https://github.com/stefanprodan/k8s-podinfo/blob/master/pkg/fscache/fscache.go#L59
The only downside is the delay. It takes at least 2 minutes for kubelet to update the symlink while the CRD event is almost instantaneous.
By converting the agent part of the launcher to be deployed as a
CustomResource
we can pass arguments and options via theCustomResourceDefinition
and with that also solve https://github.com/weaveworks/launcher/issues/145.Besides the configurations we would also solve the problem of just applying the manifest file every X amount of time, but instead we would watch and react on events, e.g. when a
Secret
gets deleted we would update the resources, similiar to how we already do for the cloudwatch in the agents.We could still keep the bootstrap part of the launcher and in it generate the
CRD
manifest file needed to deploy the agentCR
(weave-cloud-operator
:) ). So the install part would be the same as now, if we want to, e.g. via the curl command and the helm option could remain as well. But theweave-cloud-operator
could also be installed and configured with a standalone manifest file in the users cluster.The https://github.com/operator-framework might be useful IMO in this case to generate things needed to convert the launcher/agent into an "operator".
cc @bricef @marccarre @dlespiau @leth