Closed networkop closed 4 years ago
Just for the record, ingress's status object is same as service's status object[1]. We can probably just reuse most of the code after some refactoring.
Yes, in fact, this status is copied from an underlying LB service by the ingress controller (e.g. [1]).
However, I think we still need to know (and pass to the client) the spec.rules[*].host
to build the correct HTTP host header.
However, I think we still need to know (and pass to the client) the spec.rules[*].host to build the correct HTTP host header.
That's an interesting detail. Thanks @networkop.
There are two interesting problems:
We need to have a way to transfer the host information from ingress definition to the HTTP probe. We can probably use an internal/private label to achieve that, say something like _cloudprober_http_host.
How to handle multiple hosts in ingress definition. Is it common to have multiple hosts in an ingress definition?
We can probably use an internal/private label to achieve that, say something like _cloudprober_http_host.
Yeah, I think something like that should work and it's fairly easy to add.
How to handle multiple hosts in ingress definition. Is it common to have multiple hosts in an ingress definition?
This may not be common but it's definitely possible and even mentioned in the offical k8s ingress docs.
Another questions is how to do DNS resolution for these hosts. The most straigh-forward way is to do it locally on the client, based on information provided in Resource.IP
as collected from Ingress.status.LoadBalancer
. However, this may result in false negatives in certain situations. If a user relies on external DNS integration and this integration fails for some reason (e.g. IAM permissions changed), then Ingress will not be accessible but cloudprober will continue reporting it as healthy.
This will be really good if cloudprober can discover ingress and then provide the golden signals from those ingress.This will be one of the killer feature of cloudprober if can be implemented
@infa-kparida thanks for the input. Yes, we are planning to add support for the "ingress" resource type.
I've not yet figured out how to name targets for multiple host rules within an ingress, for example, we can do:
I think <ingress_name>_<host_fqdn>
or <ingress_name>_<index>
would do, as you can have multiple unique targets defined in a single ingress resource.
Also @manugarg , I was thinking about implementing this locally but then I noticed you've added it to the next milestone. I'm happy to help with the implementation or testing to help speed things up, so please let me know if you need any help.
Thanks @networkop. I had started a change some time back. I'll try to finish it in a couple of days (it will mostly get submitted around Sep 08 though). I'll let you know how it goes.
Thanks @networkop. I had started a change some time back. I'll try to finish it in a couple of days (it will mostly get submitted around Sep 08 though). I'll let you know how it goes.
I was over-optimistic to think that I'll get it done over the long weekend :) I've the change out for review (internally) now. It will take a few more days I think.
Alright, #469 adds support for ingress resources.
Following example in the test shows how resources end up getting named and can be filtered using labels: https://github.com/google/cloudprober/blob/928d3def650642b3fcf1ef845ffd19316a241d7d/rds/kubernetes/ingresses_test.go#L88
Ingress resources get the following two labels:
"fqdn" : is set to the "host" in the ingress rule.
"relative_url" : is set to the "path" in the ingress rule.
@networkop and @infa-kparida, please let me know if you have any thoughts on it.
Great, thanks @manugarg , I'll give it a spin in the next few days and let you know.
Hi @manugarg , here are a few issues I've picked up so far:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
labels:
name: rds-reader
rules:
- apiGroups: [""]
resources: ["*"]
verbs: ["get", "list"]
- apiGroups:
- extensions
- "networking.k8s.io" # k8s 1.14+
resources:
- ingresses
- ingresses/status
verbs: ["get", "list"]
URL is malformed when the client uses custom surfacer.prometheus_surfacer.metrics_prefix
. For example if the prefix is set to cloudprober_
, the URL becomes: http://cloudprober_google.com
HTTP path gets appended to the URL incorrectly. E.g. if I have an ingress with path /download
, the URL becomes: http://google.com__download
another issue is that the URL that gets built and the client tries to resolve is in the format ingressName_ingressHostname
, e.g. my-ingress_google.com
seems like the above DNS resolution issues are fixed with http_proxy.resolve_first: true
Thanks a lot for checking @networkop.
RDS now needs and updated set of RBAC rules:
Yeah, this is expected. I'll update the document.
URL is malformed when the client uses custom surfacer.prometheus_surfacer.metricsprefix. For example if the prefix is set to cloudprober, the URL becomes: http://cloudprober_google.com
I didn't get this one. This prefix is added to the metric names. Which URL is this?
HTTP path gets appended to the URL incorrectly. E.g. if I have an ingress with path /download, the URL becomes: http://google.com__download
another issue is that the URL that gets built and the client tries to resolve is in the format ingressName_ingressHostname, e.g. my-ingress_google.com
So both of these issues were supposed to be resolved by target labels. HTTP probe should use "fqdn" and "relative_url" labels, which are added by RDS automatically ("fqdn" label is used for URL host as well as host header).
Just to confirm, you tried with cloudprober head, right? Not just the ingress change.
I'll investigate more. Thanks a lot for giving it a try. That's very helpful.
@networkop I tried running an HTTP probe like this:
probe {
name: "pod-to-ingress"
type: HTTP
targets {
rds_targets {
resource_path: "k8s://ingresses"
filter {
key: "namespace"
value: "default"
}
}
}
additional_label {
key: "fqdn"
value: "@target.label.fqdn@"
}
additional_label {
key: "probeurl"
value: "@target.label.relative_url@"
}
http_probe {}
}
Where ingress is:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
name: rds-ingress
spec:
rules:
- host: foo.bar.com
http:
paths:
- path: /health
backend:
serviceName: cloudprober-test
servicePort: 9313
- host: prometheus.bar.com
http:
paths:
- path: /
backend:
serviceName: cloudprober-test
servicePort: 9313
With this I get logs like this:
2020-09-16 12:17:31.005 PDT
Target:foo.bar.com, URL:http://foo.bar.com/health, http.doHTTPRequest: Get "http://foo.bar.com/health": dial tcp: lookup foo.bar.com on 10.31.240.10:53: no such host
Expand all | Collapse all{
insertId: "1sxfsuig1rbeq6l"
logName: "projects/google.com:bbmc-stackdriver/logs/cloudprober.pod-to-ingress"
receiveTimestamp: "2020-09-16T19:17:41.024229028Z"
resource: {…}
severity: "WARNING"
textPayload: "Target:foo.bar.com, URL:http://foo.bar.com/health, http.doHTTPRequest: Get "http://foo.bar.com/health": dial tcp: lookup foo.bar.com on 10.31.240.10:53: no such host"
timestamp: "2020-09-16T19:17:31.005359502Z"
}
Notice that the URL is correct here.
And metrics look like this (notice additional labels here):
total{ptype="http",probe="pod-to-ingress",dst="rds-ingress_foo.bar.com__health",fqdn="foo.bar.com",probeurl="/health"} 25 1600284207212
total{ptype="http",probe="pod-to-ingress",dst="rds-ingress_prometheus.bar.com",fqdn="prometheus.bar.com",probeurl="/"} 25 1600284207212
success{ptype="http",probe="pod-to-ingress",dst="rds-ingress_foo.bar.com__health",fqdn="foo.bar.com",probeurl="/health"} 0 1600284207212
success{ptype="http",probe="pod-to-ingress",dst="rds-ingress_prometheus.bar.com",fqdn="prometheus.bar.com",probeurl="/"} 0 1600284207212
latency{ptype="http",probe="pod-to-ingress",dst="rds-ingress_foo.bar.com__health",fqdn="foo.bar.com",probeurl="/health"} 0.000 1600284207212
latency{ptype="http",probe="pod-to-ingress",dst="rds-ingress_prometheus.bar.com",fqdn="prometheus.bar.com",probeurl="/"} 0.000 1600284207212
timeouts{ptype="http",probe="pod-to-ingress",dst="rds-ingress_foo.bar.com__health",fqdn="foo.bar.com",probeurl="/health"} 0 1600284207212
timeouts{ptype="http",probe="pod-to-ingress",dst="rds-ingress_prometheus.bar.com",fqdn="prometheus.bar.com",probeurl="/"} 0 1600284207212
hm.. maybe they were false positives. Let me retest tomorrow and report back.
it looks like it was false positives. I may have had DNS not fully functioning at one point and I looked at dst
and assumed that's what was getting resolved. So far I've reverted back all of the workarounds I put in place, including the metrics_prefix and everything seems to be working fine.
Thanks so much for confirming @networkop.
I'll go ahead mark this issue closed. We can re-open this or file another one if we find a problem with the ingress resources discovery.
Similar to how dynamic targets can be discovered from pods, endpoints and services, it'd be nice to have RDS discover ingress resources as well. The result would be an FQDN with optional set of IP (if resolved by the RDS server).