Enhance knative/serving observability use case, to expose the application metadata

PiotrWodecki commented 3 years ago

/area monitoring

Describe the feature

As part of this enhancement request, I'd like to cover the following use cases:

I want to be alerted by my monitoring system when there were created more than X instances of application (application represented by some unique value of annotation or label on the Pod)
I want to use observability as the input to the AI based automation (commercial examples: Instana, Turbonomic) to address e.g. resource optimization, services re-allocation across k8s clusters, etc.
I want to keep track of the resource usage of the served services, which translates directly to the software license usage and billing (e.g. based on the consumed CPU time or memory allocation)

All above have in common the need of having the way to uniquely identify the applications, which are either covered by recommended Kubernetes labels, or some vendor-specific labels or annotations on the Pods being served. The actual Pod's annotations come from the template in the metadata section of the given Revision. Revisions can be modified directly, so it would be challenging to design a system which would precisely track all the required metadata and also account for possible modifications happening on the fly.

Given that Prometheus does not collect the Pod's metadata, any observability use cases listed above fall short with the need of having an additional specialized data collector/forwarder to collect additional metadata, to be used then in combination with other knative data, to back-track the source of the application runtime definition. With the proposed solution, we're extending knative's monitoring use case to have one single metrics endpoint returning all necessary data.

Proposed solution

Let's deep dive into one specific example:

One source of truth for Serving-managed resources for observability, including actual created Pod metadata inherited from Revision
Extend the metrics with all the metadata (labels & annotations) sourced from Revision resource
Autoscaler's actual_pods: assumptions: application uses app.kubernetes.io/name in the knative Revision

Prometheus' metrics before:

actual_pods{configuration_name="mysql-service", namespace_name="default", revision_name="mysql-service-00009", service_name="mysql-service", etc.}

Prometheus' metrics after the code change:

actual_pods{configuration_name="mysql-service", namespace_name="default", revision_name="mysql-service-00009", service_name="mysql-service", label_app_kubernetes_io_name="mysql" , etc.}

As you can see, the modification picked up the label correctly. In order to prevent duplicate tags, it's prefixed with "label" string ("annotation" respectively for annotations) and sanitized for Prometheus. This modification will make autoscaler return all labels and annotations that are attached to a given Pod.

In order to make sure the test suite handles correctly the monitoring metrics reporting additional labels (from either annotations or labels), additional change is required in the pkg repository, to introduce a permissive assert function AssertMetricRequiredOnly.

markusthoemmes commented 3 years ago

Copying from the PR as the discussion might be more suited here:

Have you considered the impact that the additional labels will have wrt. exploding the prometheus load? AFAIK, it creates a new timeseries for each individual label value and we've had trouble with that in the past. Is that a concern here?

PiotrWodecki commented 3 years ago

It's true that every unique key-value label pairs represent a new time series in Prometheus. But I would consider labels and annotations that appear on pods to have low cardinality and remain mostly stable throughout Service's lifecycle. Could you please link to the issue you had with this in the past?

julz commented 3 years ago

I'm interested in why (and if) knative/autoscaler is the best place to do this. What're the advantages of building this in to knative (where it would only apply to ksvcs that happen to use the kpa autoscaler) rather than using existing tools (would https://github.com/kubernetes/kube-state-metrics work for this?) to reflect pod data from the k8s api server into Prometheus?

pgodowski commented 3 years ago

Fair ask @julz and thanks. Chiming in, as I was working with @Niflhem on this one. While kube state metrics is a valid place to make this extension (and we will work with k8s community on that too), I have concerns about separation of duties. Specifically, some application monitoring solutions are isolated from the kube-state-metrics data, whereas those will be allowed to pull in data from 'application layer' which knative is still being considered (given it's not yet (?) integral part of k8s platform).

Aditionally, this enhancement is not to expose Pod's data via knative observability, but to expose knative's Revision's metadata, allowing unique identification of the application served via kpa autoscaler.

Your comment though made me thinking that perhaps few options we might be considering:

have this a configuration option AND/OR
expose only specific set of metadata, rather than all metadata from Revision; this would also address comment from @markusthoemmes

Would it make sense?

julz commented 3 years ago

Specifically, some application monitoring solutions are isolated from the kube-state-metrics data, whereas those will be allowed to pull in data from 'application layer' which knative is still being considered (given it's not yet (?) integral part of k8s platform).

I'm not sure I totally follow this distinction tbh. If the cluster administrator wants to expose pod level metrics to users can't they set up something like kube-state-metrics in the same way and at the same time they set up Knative? Fundamentally there's someone deploying Knative on the cluster, why doesn't this person set up Kube-state-metrics also? We could even document how to do this potentially.

I think the fundamental question I'm trying to understand here is the value of implementing the use cases in the bullet points in the issue description only for Knative and only for KSvcs that use the KPA autoscaler, when you could instead implement a solution that would also work for Deployments/StatefulSets/Ksvs with HPA etc. Perhaps my confusion is that the example uses actual_pod_count which is really a k8s level thing. Are there any use cases that are specific to metrics only Knative/autoscaler would know?

have this a configuration option AND/OR expose only specific set of metadata, rather than all metadata from Revision

I feel like the fact we may want to get in to this level of customisation suggests it may be better to use more generic solutions, outside of Knative (but I might personally change my mind if there were specific use cases that couldn't work that way).

markusthoemmes commented 3 years ago

Agreeing with @julz sentiment. The "instances of application (application represented by some unique value of annotation or label on the Pod)" portion also sets off some trip wire alert in my head that makes me think that we're breaking an abstraction layer here. Knative, imo, should only put the metadata on the metrics that it knows about.

pgodowski commented 3 years ago

Are there any use cases that are specific to metrics only Knative/autoscaler would know?

The extended use case is not only about actual number of instanced deployed, but also the number of possible instances to be deployed, which is something that only autoscaler knows about.

Knative, imo, should only put the metadata on the metrics that it knows about.

This is valid point, thus again, we're speaking about knative's metadata to be exposed (metadata of Revision, not metadata of Pod). Hope it makes more sense now.

julz commented 3 years ago

The extended use case is not only about actual number of instanced deployed, but also the number of possible instances to be deployed, which is something that only autoscaler knows about.

Is there a specific example of a use case that could not be satisfied by pushing the desired and actual replicas (and labels) of the Deployment to prometheus via something like Kube-state-metrics or Sysdig? In particular it'd be useful to know if there are reasons the use case is unique to Knative and would not apply to an app deployed directly as a Deployment (or StatefulSet)

metadata of Revision, not metadata of Pod

I think the annotations and labels from the revision are present on the pod, does this help?

skonto commented 3 years ago

Prometheus can do relabeling at the service monitor level and its configuration level. Wondering if you tried it at all to inject static info that uniquely identifies your app.

I think the annotations and labels from the revision are present on the pod, does this help?

Here is the output of queue proxy metrics: revision_app_request_count{configuration_name="event-display",container_name="queue-proxy",namespace_name="apiserversource1",pod_name="event-display-00001-deployment-658fd4f9cf-qcnr5",response_code="200",response_code_class="2xx",revision_name="event-display-00001",service_name="event-display"}. Revision info is there.

FWIW, downstream we get K8s metadata via kube-state-metrics.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 90 days with no activity. It will automatically close after 30 more days of inactivity. Reopen the issue with /reopen. Mark the issue as fresh by adding the comment /remove-lifecycle stale.

PiotrWodecki commented 3 years ago

Closing for now, may reopen in the future.

knative / serving

Enhance knative/serving observability use case, to expose the application metadata #11673

Describe the feature

Proposed solution