kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
109.88k stars 39.34k forks source link

Remove prometheus dependencies from k/k codebase #89267

Open serathius opened 4 years ago

serathius commented 4 years ago

As part of Metrics Stability KEP we migrated from Prometheus client to Kubernetes Metrics Framework. So we no longer need to depend on promethues client directly.

I configured Bazel to forbid importing prometheus directly. https://github.com/kubernetes/kubernetes/blob/master/hack/verify-prometheus-imports.sh (EDITED)

Example PR that removes reference https://github.com/kubernetes/kubernetes/pull/85289

There are still couple of directories that need to be cleaned up:

/sig instrumentation /kind cleanup /cc @logicalhan

Please volunteer if your interested in working on certain directory. /help

notpad commented 4 years ago

/assign I am interested in working on pkg/scheduler/framework/v1alpha1

RainbowMango commented 4 years ago

Thanks @serathius for picking this. Let's try done this by 1.19. /milestone v1.19

tanjunchen commented 4 years ago

/assign staging/src/k8s.io/apiserver/pkg/admission/metrics staging/src/k8s.io/apiserver/pkg/util/flowcontrol/metrics

zhouya0 commented 4 years ago

/assign pkg/master cluster/images/etcd-version-monitor

tanjunchen commented 4 years ago

@zhouya0 sorry , i open a pr just now

zhouya0 commented 4 years ago

@zhouya0 sorry , i open a pr just now

Ok, but all of it? Should I remove them all?

tanjunchen commented 4 years ago

@zhouya0 sorry , i open a pr just now

Ok, but all of it? Should I remove them all?

just for pkg/volume/util/operationexecutor , you can pick up others , thanks.

RainbowMango commented 4 years ago

Just a reminder. There are several methods in testutil.go, If you are removing the reference from a test, please try using these methods refactor the test case first.

seh commented 4 years ago

Does this cover the component-base library as well? It relies on several Prometheus modules.

Oh, I see it's in the whitelist here.

serathius commented 4 years ago

Hey @seh, component-base/metrics is the intended wrapper for Prometheus. So it's intended to be using it there.

zhouya0 commented 4 years ago

I have a question: cluster/images/etcd-version-monitor has deeply rely on github.com/prometheus/client_model/go with three structs :dto.MetricFamili, dto.LabelPair, dto.Metric. so it's not easy to move these fucntions to staging/src/k8s.io/component-base/metrics. What I'm going to do is simply wrap these structs like:

type MetricFamily struct {
    *dto.MetricFamily
}

Is that suitable to do this? If yes, where should I put this definition?

notpad commented 4 years ago

I am working on the pkg/scheduler/framework/v1alpha1 part, and got a problem here:

This test needs to get the label names/values, sampleCount of a Histogram, I didn't find function to get these two values in testutil.go.

Also, I am trying to use CollectAndCompare to pass the expected metric string and compare, but the problem is sampleSum is not a constant value (it is a duration and will change each time we run the test), so I guess we can't use CollectAndCompare.

So, shall I introduce new methods in testutil.go to get the labels and sampleCount?

@serathius @RainbowMango
Do you have any suggestion? Thanks

zhouya0 commented 4 years ago

pkg/master has been fixed by this PR, isn't it? https://github.com/kubernetes/kubernetes/pull/76741/files

serathius commented 4 years ago

@zhouya0 If so then please just remove it from prometheus_import_allow_list

serathius commented 4 years ago

Personally I would be for focusing on removing dependencies in this issue. I have doubts that prometheus "CollectAndCompare" matches on how we currently do tests in k8s. Before refactoring we should agree on how metrics should be tested.

@RainbowMango WDYT?

RainbowMango commented 4 years ago

Personally I would be for focusing on removing dependencies in this issue.

The reference we need to remove mainly is github.com/prometheus/client_golang/prometheus, that we can ensure all components turn into component-base instead of using Prometheus directly.

For other references, actually we don't need to move them in a rush as they won't hurt anything. The benefit when we move these references is we can provide a uniform style for metrics testing.

I have doubts that prometheus "CollectAndCompare" matches on how we currently do tests in k8s.

If current utils can't meet the testing requirement, then we can introduce COMMON util in testutil. I don't want testutil package to be a grocery store, that what I mean do not just move code into it.

serathius commented 4 years ago

Visibility rules for prometheus were moved to better location. Now you can find them here https://github.com/kubernetes/kubernetes/blob/master/build/visible_to/BUILD#L428

yiyang5055 commented 4 years ago

/cc

jtslear commented 4 years ago

Hello, @jtslear here from Bug Triage! This issue has not been updated for a long time, so I'd like to check on the status. The code freeze is starting June 25th and while there is still plenty of time, we want to ensure that each PR has a chance to be merged on time.

As the issue is tagged for 1.19, is it still planned for this release?

RainbowMango commented 4 years ago

As the issue is tagged for 1.19, is it still planned for this release?

Yes.

jtslear commented 4 years ago

Due to reaching the code freeze deadline, I'm going to bump this to the next release.

/milestone v1.20

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

MonzElmasry commented 3 years ago

Hey :wave: Bug Triage here. Wanted to follow up on the status of this issue as we're approaching code freeze on 12.11.2020. This issue is tagged for 1.20, is it still planned for this release?

sayanchowdhury commented 3 years ago

/milestone clear

fejta-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

gavinfish commented 3 years ago

/remove-lifecycle rotten

@serathius It seems all related PRs here are either merged or closed. But there is still some work left, maybe I can take this up.

serathius commented 3 years ago

Sure, some issues were not finished before they were closed due to time passed. Doesn't look like current owners will continue the work. Feel free to pick up directory for fixings. Let's make sure that we notify previous owner.

Just comment with information which directory you want to pick up and cc current owner.

gavinfish commented 3 years ago

Hi @notpad, I will pick up scheduler part. Ping me if you are still working on this.

gavinfish commented 3 years ago

@tanjunchen still working this? If not, I want to pick up the left ones and drive to close this issue.

tanjunchen commented 3 years ago

@gavinfish no,welcome to close this issue.thanks

gavinfish commented 3 years ago

/assign /remove-help

serathius commented 3 years ago

Thanks for picking this up I have updated issue description to reflect current state.

RainbowMango commented 3 years ago

Hi @serathius, I just noticed the bazel build system has been removed, does that means we can't forbid importing prometheus directly anymore?

serathius commented 3 years ago

Thanks @RainbowMango for pointing this out, I will follow up with author of PR to figure out on replacement.

BenTheElder commented 3 years ago

Thank you for flagging this and apologies for the oversight, https://github.com/kubernetes/kubernetes/issues/99876 to track migrating this.

BenTheElder commented 3 years ago

Restrictions are in place again with an alternative mechanism. Sorry about that.

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

gavinfish commented 3 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

serathius commented 2 years ago

/lifecycle frozen

MikeSpreitzer commented 2 years ago

I am writing some new code that should be allowed to import from github/prometheus, but is getting dinged by CI. Please see #109277 . How do I mark this as OK, so that I do not get the complaint in https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/109277/pull-kubernetes-verify/1510880783680671744 ?

dims commented 2 years ago

@MikeSpreitzer update https://github.com/kubernetes/kubernetes/blob/00bd91e558612de9b89b4c76c7a42121ac2835c6/hack/verify-prometheus-imports.sh#L38-L71 please

MikeSpreitzer commented 2 years ago

@dims: done, thanks for the pointer.

mengjiao-liu commented 2 years ago

Are you still working on the remaining tasks? If not, I want to try. @gavinfish