jetstack / version-checker

Kubernetes utility for exposing image versions in use, compared to latest available upstream, as metrics.
https://jetstack.io
Apache License 2.0
665 stars 76 forks source link

Issue with GKE, Grafana, CertManager, SonarQube latest version and SHA tag and old metric #147

Open AleksandarMis opened 5 months ago

AleksandarMis commented 5 months ago

Hi,

First of all, thank you very much for the very helpful service version checker :) !

We have k8s running on GCP and the latest version-checker version: Version-checker: quay.io/jetstack/version-checker:v0.3.3

In our config for the deployment of version-checker we added: a flag: "--test-all-containers=true" annotations: enable.version-checker.io/version-checker: "true" use-sha.version-checker.io/version-checker: "false"

For the most of components and pods it works good and with exactly versions, but with some of them we have issues:

  1. GKE image As you can see, our current gke version is v.1.26.5-gke.2700 and the latest version found by version-checker is v1.18.6 We have checked the logs and found following error:

_"textPayload": "time=\"2024-01-04T15:52:04Z\" level=error msg=\"error syncing 'kube-proxy-gke-staging-app-pool-a-xxxxx/kube-system': failed to sync pod kube-proxy-gke-staging-app-pool-a-xxxxxxx/kube-system: failed to check container image \\"kube-proxy\\": failed to get tags from remote registry for \\"gke.gcr.io/kube-proxy-amd64\\": failed to get docker image: Get \\"https://gcr.io/v2/google-containers/kube-proxy-amd64/tags/list\\": context canceled, requeuing\" module=controller",_

Actually https://gcr.io/v2/google-containers/kube-proxy-amd64/tags/list hasn't been updated for a long time because they switched to https://gke.gcr.io/v2/kube-proxy-amd64/tags/list and new images are published there.

The latest version of gke: v1.29.0-gke.1324000

P.S. kube-proxy is deployed as a static Pod for nodes.

  1. Grafana The same issue as described here https://github.com/jetstack/version-checker/issues/138

For the image grafana/grafana version-checker says the latest version is 9799770991 and the actual latest tag currently would be 10.3.1.

image

  1. CertManager

For the image quay.io/jetstack/cert-manager-controller version-checker says the latest version is 608111629 and the actual latest tag currently would be 1.13.3 image

4.SonarQube For the image sonarqube version-checker says the latest version is 7.10 and the actual latest tag currently would be 10.3 image

The reason behin that is that we use image from our jfrog where is authentication required and it fails with error UNAUTHORIZED.

I tried overwriting the URL in sonarqube deployment to check for the latest image in https://hub.docker.com/_/sonarqube , but it doesn't help and I can't find any error after overwriting and redeploying.

image

  1. SHA tag Many images with sha tag are sent to prometheus by version-checker and are then recognized as not being the latest (as in example below), because currentversion is the same but only without or different sha tag. I wanted to disable sha tag and test on version-checker itself, with **use-sha.version-checker.io/version-checker: "false"_** and version-checker latest images with sha are still shown and recognized as "Is latest" on some stages and not on others NOT, even though they are EXACTLY THE SAME VERSIONS:

image

Since we have a lot of components and plan maintanance appendix to the filter "Is latest=NO", then we are shown wrong and we have to check manually, so it would be great if we could fix that too.

  1. Version-checker send old metrics from old containers as current We have also following issue, that version-checker sends old metrics ( from containers that are not existing anymore) to prometheus. One example: image

And we have only one runner with version 16.7.0 and the old one was terminated 10 days ago.

One more example: image

Also just one pod with version 3.9.0 and the metrics that are sent to prometheus are:

http://localhost:8080/metrics

_version_checker_is_latest_version{container="grafana-renderer",container_type="container",current_version="3.8.4",image="grafana/grafana-image-renderer",latest_version="3.9.0",namespace="grafana",pod="grafana-renderer-859948fb9f-wrbzb"} 0

version_checker_is_latest_version{container="grafana-renderer",container_type="container",current_version="3.9.0@sha256:656ca4dddc020f067239428e2a15bc7100d8ce4918db1618b45d53d0c8c4d273",image="grafana/grafana-image-renderer",latestversion="3.9.0@sha256:a1e0c69aaa5c1fe106c89ba4c5569563d8b2ac0b04e0f121b12b5c2a5b4c3f94",namespace="grafana",pod="grafana-renderer-545676cb7d-hd8lm"} 1

We have scrape_interval for serviceMonitor 30s.

` -job_name: serviceMonitor/version-checker/version-checker/0 honor_timestamps: true scrape_interval: 30s scrape_timeout: 10s metrics_path: /metrics scheme: http follow_redirects: true enable_http2: true relabel_configs:

Is there any config to add/change to send only the metric from latest scan?

The only solution currently is to redeploy the version checker after each component version upgrade and then will only send the latest version and not the old one.

Could you please help?

Thanks in advance!

BR Aleks

hawksight commented 3 months ago

@AleksandarMis thank you for raising such a detailed issue for us. I need a bit of time to digest all the issues presented here and determine if we already have similar or duplicated issues open, such as #138 as you already mentioned.

Just wanted to say we've seen your issue and I plan to take a look in the next few weeks.

davidcollom commented 1 month ago

Hi @AleksandarMis, I've taken an extensive look into the reported issues and found the following:

1. GKE

Thank you for this, I'd totally forgotten that images were now moved under the gke.gcr.io and have opened a PR to resolve this.

As GKE images are suffixed with metadata, you MUST ensure that the pods are annotated with use-metadata.version-checker.io/kube-proxy="true" for the versions to be successfully matched.

See PR https://github.com/jetstack/version-checker/pull/202 that should resolve this issue.

2. Grafana

This issue is a tricky one, primarily caused by Grafana pushing an image that doesn't match semver and also has no metadata (which by default version-checker would have voided the version).

I've tried a number of days to try and see if I could tweak version-checkers sorting and selection process to no-avail, however I have resolved this by adding the following annotation to grafana:

~$ curl -s localhost:8080/metrics | grep 'grafana/grafana'
version_checker_is_latest_version{container="grafana",container_type="container",current_version="10.3.4",image="grafana/grafana",latest_version="9799770991",namespace="kubecost",pod="kubecost-grafana-585f95598c-mqq8m"} 0
~$ k annotate pods finops-grafana-5f67d6fc4-dqkfc match-regex.version-checker.io/grafana='(\d+)\.(\d+)\.(\d+)'
pod/finops-grafana-5f67d6fc4-dqkfc annotated
~$ curl -s localhost:8080/metrics | grep 'grafana/grafana'
version_checker_is_latest_version{container="grafana",container_type="container",current_version="10.3.4",image="docker.io/grafana/grafana",latest_version="11.0.0@sha256:a80bc3848cf5d4b2958ea25dbeb36fa9442ef4be8c73fe4bff11340307c32919",namespace="davidtest",pod="finops-grafana-5f67d6fc4-dqkfc"} 0

Which correctly registers the valid latest image as 11.0.0

3. Cert-Manager

This is similar to that of grafana, the 608111629 image is taken as 608111629.0.0 and therefore always registered as the latest version.

To resolve this, the easiest approach is to annotate these pods with: match-regex.version-checker.io/cert-manager='v(\d+)\.(\d+)\.(\d+), This ensures that tags not starting with v are discarded.

4. Sonarqube

To re-write the URI - you would set the value to be docker.io/sonarqube I.E: override-url.version-checker.io/sonarqube: docker.io/sonarqube, if debugging is enabled -v=debug then you should see the following log line:

overriding image lookup docker.io/grafana/grafana -> sonarqube  module=version_getter

Taking a look at Sonarqube's tagging strategy, it looks like you'll also need to enable the metadata usage:

use-metadata.version-checker.io/sonarqube="true"

(I used my Grafana instance as an example here)

Assuming you're only interested in say the community editions, you can still use the regex match (from above) like so:

k annotate pods kyverno-admission-controller-969cddd96-dh8ht match-regex.version-checker.io/kyverno='(\d+)\.(\d+)\.(\d+)-community'

(This didn't pick up the correct output due to their being metadata in the versions of the container I'm testing against, but you can see that version-checker detected the correct latest tag within the metric labels):

version_checker_is_latest_version{container="kyverno",container_type="container",current_version="v1.12.3",image="docker.io/sonarqube",latest_version="10.5.1-comm
unity",namespace="davidtest",pod="kyverno-admission-controller-969cddd96-dh8ht"} 1

I've raised https://github.com/jetstack/version-checker/pull/204 to help make this a little clearer of where the versions actually come from when using the overrideURI option/annotation.

5. SHA Tags

I have seen this locally and been trying to address this issue more thoroughly, however this is more challenging that expected. I have noticed that this some what resolves it self for many images, the SHA256 is used to detect if the container/image being used at the time of version-checkers checking, has had a newer version pushed (I.E: re-tagged) for what ever purpose (patching/CVE resolution).

The annotation use-sha.version-checker.io/${container} is mainly used for the version checking based of SHA, and uses the timestamp in which the image was pushed to the registry. This can be used if you want to use latest or v3 and ensure that you have that version up to date.

6. Send old metrics

I suspect this is related to the image-cache-timeout flag being set to the default of 30 Minutes. Meaning metrics can still be reported after their deletion. I need to do some more thorough investigation on this one, I also suspect that this could be related to annotations changing and/or that version-checker is missing container deletion and therefore missing the deletion of the metrics from the metrics registry.

I've raised the following PR to attempt to address this: https://github.com/jetstack/version-checker/pull/203

End Note:

Once again, thank you for such a detailed and long issue, but with some really good example use-cases. The information you provided have been incredibly interesting and do highlight, just how difficult checking versions can be.