Closed satnam6502 closed 8 years ago
@satnam6502 - I have the same concern about the cache but it's required to reduce unnecessary calls to api server. Options are putting a TTL on cached info (easy but how long TTL?) or using watches on pod info to update cache (slightly trickier but should be doable).
Bearer tokens would also have to handle refreshes, certs seem easier for now, but we can change if needed.
I'm keen for us to keep building the base images as I do -- I try to use common blocks (ubuntu:14.04, jre-7 etc.) amongst the Elasticsearch, Logging and other images to try and factorize out layers to reduce load time etc.
Once we have per node controllers we can deliver fluentd as a privileged pod with a service account secret which means rotation would be handled automatically.
@jimmidyson I've been told by @roberthbailey that currently bearer tokens have an indefinite lifetime.
@smarterclayton - Service account secret would be file containing bearer token or something else? Would need to re-read a file at some point to pick up secret changes?
@satnam6502 @roberthbailey but I assume that will change to require refreshes at some point?
@satnam6502 - other than the base os image I don't see the fluentd (ruby) image containing any layers in common with say an elasticsearch image (java)
@jimmidyson how about for now using the token-admin
secret (as it is done in #7988 to speak to the api sever) and I can add a gem-install step and update the Fluentd config in the two Fluentd images?
Once we have a better solution for accessing the apiserver from the master we can switch to that. @erictune
I am very keen to propagate labels so what you've written is very useful (and I had intended to write something similar -- except I had to learn Ruby first!).
@jimmidyson You are correct that at some point we will need a credential refresh story.
Containing signed jwt assertion that can be invalidated on the server and rotated. Restart or redeploy would pick up newest. Eventually we would expand the reach of it if necessary. That will land in the next week probably.
On May 8, 2015, at 6:49 PM, Jimmi Dyson notifications@github.com wrote:
@smarterclayton - Service account secret would be file containing bearer token or something else? Would need to re-read a file at some point to pick up secret changes?
— Reply to this email directly or view it on GitHub.
@jimmidyson -- why don't you go ahead and make the PR to update the images? I tried to assign the issue to you but failed.
@satnam6502 - let me add in the option to use bearer token too when I get a chance.
@satnam6502 - Happy to! I'll look at updating the images & plugin config early next week.
I've added in option to use bearer_token_file
in addition to client certs, released in 0.2.0.
@smarterclayton - can you expand on what is required to use the jwt assertion? Exchange the jwt for an access token? Can't see details at https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/service_accounts.md where I was expecting to see how to use it.
It would be passed as a bearer token, the authorized would simply handle it differently than "normal" tokens.
On May 9, 2015, at 11:17 AM, Jimmi Dyson notifications@github.com wrote:
I've added in option to use bearer_token_file in addition to client certs, released in 0.2.0.
@smarterclayton - can you expand on what is required to use the jwt assertion? Exchange the jwt for an access token? Can't see details at https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/design/service_accounts.md where I was expecting to see how to use it.
— Reply to this email directly or view it on GitHub.
@smarterclayton That's easy then! Does a service account bearer token expire? If so, how should this be handled? Exit process & let container be recreated with new service account token allocated?
@satnam6502 Adding in cache TTL option. What do you think is a sane default? I'll add in pod watch option soon too.
Right now it doesn't expire but can be rotated at any time. In the event of a rotation, you'd want to trigger a rolling update or rollout (we don't have a good word for rolling out changes to a per node controller yet, although I would hope the deployment object could handle it eventually).
Our secret redistribution guarantee today is "when a new pod is created" which has some nice properties. @erictune and I had discussed on several occasions whether we needed a stronger guarantee, but in the short term both volumes and secrets specifically feel more aligned with the lifespan of the pod. If we allow pod updates in the future we could potentially have that align with updates.
On May 10, 2015, at 5:11 AM, Jimmi Dyson notifications@github.com wrote:
@smarterclayton That's easy then! Does a service account bearer token expire? If so, how should this be handled? Exit process & let container be recreated with new service account token allocated?
— Reply to this email directly or view it on GitHub.
@jimmidyson if the purpose of the TTL is to limit how much data we store in the plug-in plus limit the rate of API calls to apiserver then how about 1 hour?
@satnam6502 an hour seems really high if the point of refreshing is to pick up changes to the pod metadata. Once every 30 seconds or minute seem like more reasonable response times to changes.
@a-robinson @satnam6502 once I get pod watching working properly it will make the TTL less (un?) important.
@a-robinson OK.
@satnam6502 @a-robinson Pod watching is now in the fluentd kubernetes plugin. Just working through the Docker images & testing.
Great. Which Docker images?
@satnam6502 The fluentd images as we discussed on this issue last week.
I wonder if we need a branch for things going into V1.0 and things that are not?
On Fri, May 15, 2015 at 12:14 PM, Jimmi Dyson notifications@github.com wrote:
@satnam6502 https://github.com/satnam6502 The fluentd images as we discussed on this issue last week.
— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/8001#issuecomment-102496944 .
So you don't want label metadata in v1.0?
Let's go for it and see if it can work reliably. The removal of secret access from mirror pods seems like an issue.
What do you mean by "removal of secret access from mirror pods"? What's a mirror pod?
The fluentd-elasticsearch logging pod and the fluentd-cloud-logging pods are special -- they are created from manifest files and one instance is laid down on each node.
Ah ok. I can't find the details on secrets on mirror/static pods - do you have any details?
Another option is to have a log aggregator running as a normal pod & the log collector static pods can forward to the log aggregator for enrichment before storing in ES. Would prefer to not have to do that, but it's another option.
Agreed, it would be good to avoid the extra log aggregation and try to maintain a direct link from Fluentd to Elasticsearch. I did try to use your plugin -- see https://github.com/GoogleCloudPlatform/kubernetes/compare/master...satnam6502:labels -- but I could not get it to work. Any idea what I might be doing wrong modulo secrets?
I'm not at my computer right now, but it looks OK to a cursory look. Do you get any errors?
I don't get any errors -- but I don't see any log lines either. I'd been a couple of days since I last looked at it -- I hope to get back to it after my current stack of tasks.
On Fri, May 15, 2015 at 1:46 PM, Jimmi Dyson notifications@github.com wrote:
I'm not at my computer right now, but it looks OK to a cursory look. Do you get any errors?
— Reply to this email directly or view it on GitHub https://github.com/GoogleCloudPlatform/kubernetes/issues/8001#issuecomment-102522454 .
@satnam6502 OK I've seen what the issue is. Bad documentation on my part :) It does work but requires mounting the docker socket at the moment, not desirable as it would require privileged container to do so, otherwise SELinux will prevent access to the docker socket. Don't want to run a privileged container unless we have to. So hang tight with this piece of work - very nearly there :)
In the meantime, I've put in PR #8374 to make the Docker log symlink easier to parse. Currently there's no way to separate out the namespace & container name from the log name. Once this is in I'll update the functionality in the fluentd plugin to use it. I've been speaking about secrets/service accounts in static pods & it seems that this isn't really covered yet, but is essential for future use cases. Will raise a separate issue to discuss & see what comes of that.
I will hang tight :-)
Un-assinging myself since I am leaving Google and Kubernetes.
Now that GKE has hit GA, I'll be carving off some time to get this working for Google Cloud Logging.
@a-robinson We should be able to leverage the fluentd PR you're reviewing - might be best to get all done in one for consistency, translating properties properly for Elasticsearch & GCL as required.
Yup, I'm hoping to be able to use that, but since GCL is going to require some extra work on fluent-plugin-google-cloud anyway I think it makes sense to get the ES improvements in first.
Just an update here: this is basically ready to go, primarily due to @jimmidyson's metadata filter plugin, with relevant integration PRs including #8632, https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/pull/30, https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/pull/31, and https://github.com/fabric8io/fluent-plugin-kubernetes_metadata_filter/pull/6.
However, that plugin can't really do its thing until it can authenticate to the apiserver. We may be able to use a short term hack to get it access by using the kubelet's credentials, but the real solution will be to use the daemon controller once it's ready (#1518 / #13182).
In the meantime, we'll still be able to get most of the information attached, we'll just miss out on pod labels, pod ID, and namespace ID.
Where is this at as far as running it on a non-google cloud kubernetes stack?
Just need to try it out with daemon controller really. Without that you don't get the reuse of pod labels in stored events as it can't auth to API server. Would be good to get someone testing it with daemon controller if possible.
Ok I'm still fairly new to this but I will give it a shot. I'm going to use the image in cluster/addons to build a fluentd image unless there is one already hosted on a public repo.
That would be awesome! There is an updated image that should work for you in Docker hub at fabric8/fluentd-kubernetes:1.3
. Will be good to hear how you get on.
hrm Im stuck on getting kubectl to take the manifest -
F1118 17:16:35.986943 20636 helpers.go:96] error validating "fluentd-daemon.yaml": error validating data: couldn't find type: v1beta1.DaemonSet; if you choose to ignore these errors, turn validation off with --validate=false
I have this when starting my api server
--runtime-config=extensions/v1beta1=true,extensions/v1beta1/daemonsets=true,extensions/v1beta1/deployments=true
Version:
╰─○ kubectl version
Client Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.1", GitCommit:"92635e23dfafb2ddc828c8ac6c03c7a7205a84d8", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.1", GitCommit:"92635e23dfafb2ddc828c8ac6c03c7a7205a84d8", GitTreeState:"clean"}
manifest -
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
labels:
heritage: helm
k8s-app: fluentd-elasticsearch
version: v1
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: fluentd-elasticsearch
image: fabric8/fluentd-kubernetes:1.3
securityContext:
privileged: true
resources:
limits:
cpu: 100m
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
env:
- name: "ES_HOST"
value: "elasticsearch"
- name: "ES_PORT"
value: "9200"
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
@jchauncey I'm no expert (just trying to get this going like you), but I think the daemonset manifest you have needs a few tweaks to add .spec.template.metadata
and .spec.template.spec.containers
so it looks something like:
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd-elasticsearch
labels:
heritage: helm
k8s-app: fluentd-elasticsearch
version: v1
kubernetes.io/cluster-service: "true"
spec:
template:
metadata:
name: fluentd-elasticsearch
labels:
heritage: helm
k8s-app: fluentd-elasticsearch
version: v1
kubernetes.io/cluster-service: "true"
spec:
containers:
- name: fluentd-elasticsearch
image: fabric8/fluentd-kubernetes:1.3
securityContext:
privileged: true
resources:
limits:
cpu: 100m
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
env:
- name: "ES_HOST"
value: "elasticsearch"
- name: "ES_PORT"
value: "9200"
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
That and an apiserver with --runtime-config=extensions/v1beta1/daemonsets=true,extensions/v1beta1/deployments=true,extensions/v1beta1/jobs=true,extensions/v1beta1/ingress=true
seemed to get the pods running on all nodes for me. I haven't verified actual log output yet, but can report back as well when I do. Good luck!
@jchauncey I ran the following on one of my apiservers to verify: curl -s http://${apiserver}:8080/apis/extensions/v1beta1
On my v1.1.1 cluster where this appears to be working I get:
{
"kind": "APIResourceList",
"groupVersion": "extensions/v1beta1",
"resources": [
{
"name": "daemonsets",
"namespaced": true
},
{
"name": "daemonsets/status",
"namespaced": true
},
...
no matter what manifest i try i still get
error validating "fluentd-daemon.yaml": error validating data: couldn't find type: v1beta1.DaemonSet; if you choose to ignore these errors, turn validation off with --validate=false
Continuing discussion from #3764 @a-robinson @mr-salty CC: @dchen1107 @roberthbailey