kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.52k stars 1.3k forks source link

Improve logging of predicates #10818

Open sbueringer opened 3 months ago

sbueringer commented 3 months ago

The logger we use in predicates today only contain an absolute minimum amount of k/v pairs, e.g.

image

Some notes:

I think we have to improve our predicate logs, otherwise they are not really useful.

(I noticed this when trying to debug predicates of a controller, which is very very hard if there is no way to determine which log belongs to which controller)

In general we should compare the logs of our predicates with the logs of a regular controller and figure out which additional k/v pairs we need + we should make sure kind is logged correctly

Tasks:

sbueringer commented 3 months ago

cc @fabriziopandini @chrischdi (just fyi, I think it's not very important, but it would be good for debuggability)

fabriziopandini commented 3 months ago

Let's try to solve for one controller first, then we can try to get help in replicating the solution for the remaining controllers /triage accepted /help

k8s-ci-robot commented 3 months ago

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/10818): >Let's try to solve for one controller first, then we can try to get help in replicating the solution for the remaining controllers >/triage accepted >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.
Karthik-K-N commented 2 months ago

I was debugging something related to pausing resource for CAPIBM and thinking of adding additional values in predicate logging for debugging locally and in the meantime came across this issue, I would be happy to work on this if no one is currently working on it.

sbueringer commented 2 months ago

Feel free! I would recommend implementing it for one controller first, so we can get consensus on how to implement it (before implementing it for everything)

Karthik-K-N commented 2 months ago

/assign

Karthik-K-N commented 2 months ago

I just tried to debug on kind value being empty string, Seems like on update events on few occasion the value is empty

{"ts":1722261991766.5825,"caller":"predicates/generic_predicates.go:194","msg":"Resource is not paused, will attempt to map resource","controller":"cluster_controller","kind":"Cluster","predicate":"ResourceNotPaused","eventType":"update","namespace":"default","":"development-16409-md-0-pmcc8-9wjbg-kkx8f","v":6}

{"ts":1722261389067.7864,"caller":"predicates/generic_predicates.go:194","msg":"Resource is not paused, will attempt to map resource","controller":"cluster_controller","kind":"Cluster","predicate":"ResourceNotPaused","eventType":"update","namespace":"default","cluster":"development-16409","v":6}

Even this is not consistent, Seen empty most of the time.

sbueringer commented 3 weeks ago

@Karthik-K-N I just talked to Christian and Fabrizio and we have an idea on how to solve the issue with kind. The problem in general is that kind is usually never set for typed objects, only for unstructured. We have to use apiutil.GVKForObject to properly retrieve the kind

Basically we would modify all predicates to take an additional scheme as parameter, e.g.

ResourceHasFilterLabel(logger logr.Logger, labelValue string)

would become

ResourceHasFilterLabel(scheme *runtime.Scheme, logger logr.Logger, labelValue string)

We would call it like this in our setup funcs

predicates.ResourceHasFilterLabel(mgr.GetScheme(), predicateLog, r.WatchFilterValue)

Inside of the predicate we would pass the scheme to where we have access to the object and then do the following:

func processIfLabelMatch(scheme *runtime.Scheme, log logr.Logger, obj client.Object, labelValue string) bool {
    // Return early if no labelValue was set.
    if labelValue == "" {
        return true
    }

    if gvk, err := apiutil.GVKForObject(obj, scheme); err == nil {
        log = log.WithValues(gvk.Kind, klog.KObj(obj))
    }
    if labels.HasWatchLabel(obj, labelValue) {
        log.V(6).Info("Resource matches label, will attempt to map resource")
        return true
    }
    log.V(4).Info("Resource does not match label, will not attempt to map resource")
    return false
}

In general I would like all of our predicate logs to have at least the following k/v pairs:

I would like to at least drop the "namespace" k/v pair as it is redundant.

If you're interested in implementing this. I would suggest let's start with one predicate first (e.g. ResourceNotPaused + also affects at least ResourceNotPausedAndHasFilterLabel or maybe another one not sure. Ideally let's pick a predicate that doesn't affect a lot of other predicates :))

Karthik-K-N commented 3 weeks ago

Sure @sbueringer , Thanks for the detailed explanation, I will submit a PR with change for one predicate, My only earlier concern was predicate signature change.

sbueringer commented 3 weeks ago

My only https://github.com/kubernetes-sigs/cluster-api/pull/10959#issuecomment-2257777301 concern was predicate signature change.

Yup, same. Talked to Fabrizio and Christian. We would be fine with it. It's an easy change to adopt to, as folks can always just pass in mgr.GetScheme in the places where Predicates are used

(sorry I think I missed your comment on the PR somehow :))

Karthik-K-N commented 3 weeks ago

Sure, Thanks for confirmation. I will go ahead with it. Will submit PR soon.

Karthik-K-N commented 6 days ago

@sbueringer so should I make the similar changes for rest of the predicates or should we wait?

sbueringer commented 5 days ago

Feel free to go ahead.

sbueringer commented 5 days ago

@Karthik-K-N Please also follow-up on: https://github.com/kubernetes-sigs/cluster-api/pull/11188#discussion_r1766623815

Karthik-K-N commented 5 days ago

@Karthik-K-N Please also follow-up on: #11188 (comment)

Sure will do that as well. Thank you.