Expose default port for monitoring, in case of create PodMonitor or S…

wangxf1987 commented 2 weeks ago

…erviceMonitor

What type of PR is this? This is improve for installation, the port of karmada-apiserver is exposed, but other componment is not.

What this PR does / why we need it:

Which issue(s) this PR fixes: Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Expose the default port  for the karmada-controller-manager, scheduler and agent when creating a PodMonitor.

karmada-bot commented 2 weeks ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign poor12 after the PR has been reviewed. You can assign the PR to them by writing /assign @poor12 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[charts/OWNERS](https://github.com/karmada-io/karmada/blob/master/charts/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

codecov-commenter commented 2 weeks ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 53.07%. Comparing base (3232c52) to head (a4d9be4). Report is 13 commits behind head on master.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #4877 +/- ## ========================================== + Coverage 53.05% 53.07% +0.01% ========================================== Files 250 251 +1 Lines 20396 20389 -7 ========================================== - Hits 10822 10821 -1 + Misses 8855 8854 -1 + Partials 719 714 -5 ``` | [Flag](https://app.codecov.io/gh/karmada-io/karmada/pull/4877/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=karmada-io) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/karmada-io/karmada/pull/4877/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=karmada-io) | `53.07% <ø> (+0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=karmada-io#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

RainbowMango commented 2 weeks ago

/assign @chaosi-zju

chaosi-zju commented 2 weeks ago

Hi! @wangxf1987 Glad to see your contribution!

The CI of this PR failed due to it wasn't signed off, usually please use git commit -s -m 'your message ' or git commit -m ' Signed-off-by: AuthorName <authoremail@example.com> \n <other message> ' to pass DCO.

Detail guideline can refer to: https://probot.github.io/apps/dco/

wangxf1987 commented 2 weeks ago

Hi! @wangxf1987 Glad to see your contribution!

The CI of this PR failed due to it wasn't signed off, usually please use git commit -s -m 'your message ' or git commit -m ' Signed-off-by: AuthorName <authoremail@example.com> \n <other message> ' to pass DCO.

Detail guideline can refer to: https://probot.github.io/apps/dco/

done

chaosi-zju commented 2 weeks ago

/lgtm

chaunceyjiang commented 1 week ago

Can you write the release-notes? @wangxf1987

wangxf1987 commented 1 week ago

Can you write the release-notes? @wangxf1987 @chaunceyjiang Ok, i should modify the release notes for that version, is it https://github.com/karmada-io/karmada/blob/master/docs/CHANGELOG/CHANGELOG-0.10.md ?

chaunceyjiang commented 5 days ago

@wangxf1987 Here. In your PR description

wangxf1987 commented 5 days ago

Expose the default port for the karmada-controller-manager, scheduler and agent when creating a PodMonitor.

done

RainbowMango commented 4 days ago

Is this ServiceMonitor and PodMonitor?

By the way, this PR focuses on exposing ports in the helm chart template, what about operators and cli? Should we keep them consistent? cc @chaosi-zju

chaosi-zju commented 4 days ago

Let me talk about my understanding of this PR:

First, our karmada-controller-manager does listened 8080 port as metrics querying API, we defined it in https://github.com/karmada-io/karmada/blob/e6c92d5cba0edfa81e920bfff98db1b4be257d82/cmd/controller-manager/app/controllermanager.go#L161

we can try as:

$ kubectl --context karmada-host exec -it karmada-controller-manager-cdbbf75dd-69fxb -n karmada-system -- sh
/ # curl http://127.0.0.1:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4

even without this PR, we can still access /metrics API of karmada-controller-manager from other pod, just like:

$ kubectl --context karmada-host exec -it karmada-descheduler-7d654cbff6-5jmrt -n karmada-system -- sh

# here, `10.244.0.12` is the pod IP of `karmada-controller-manager`.
/ # curl http://10.244.0.12:8080/metrics
.....
work_sync_workload_duration_seconds_bucket{result="success",le="0.512"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="1.024"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="2.048"} 4
work_sync_workload_duration_seconds_bucket{result="success",le="+Inf"} 4
work_sync_workload_duration_seconds_sum{result="success"} 0.075089723
work_sync_workload_duration_seconds_count{result="success"} 4

So, the normal operation of the /metrics API does not originally rely on the content proposed by this PR.

However, @wangxf1987 has needs to use Prometheus to collect metrics, as described in this tutorial: https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md#using-podmonitors

take PodMonitor as an example:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  podMetricsEndpoints:
  - port: web

In practice, the spec.selector label tells Prometheus which Pods should be scraped, while spec.podMetricsEndpoints defines an endpoint serving Prometheus metrics to be scraped by Prometheus.

Pay attention to spec.podMetricsEndpoints:

https://github.com/prometheus-operator/prometheus-operator/blob/5b880a05ab72112ff26ac3c3eb5bffddbafbc468/pkg/apis/monitoring/v1/podmonitor_types.go#L169-L184

the official API recommends to use port field, which represents Name of the Pod port which this endpoint refers to, instead of deprecated filed targetPort which represents Name or number of the target port of the Pod object behind the Service.

So, here comes the problem, we didn't specify the port field in container description of karmada-controller-manager and give it a name, which @wangxf1987 can reference it in spec.podMetricsEndpoints of PodMonitor.

this is the reason for why this PR proposed.

Hi @wangxf1987, am I right?

chaosi-zju commented 4 days ago

By the way, this PR focuses on exposing ports in the helm chart template, what about operators and cli? Should we keep them consistent?

If we all agree on the necessity of this PR, then we should build an issue to make up for the corresponding modifications of other installation methods.