Prometheus - kubernetes-service-endpoints down

pgeorgiev333 commented 6 years ago

Description

I have installed openshift 3.9 using the playbooks (deploy_cluster.yml) with the test repo RPMs. I have also enabled the metrics and prometheus. After the installation is done i looked at the prometheus "targets" page and most of the "kubernetes-service-endpoints" are down.

I looked particularly at the kubernetes_name="logging-es-prometheus" endpoint and i can see the following messages in the proxy container inside the logging-es-data-master-xxxx pod:

authorizer reason: User "system:anonymous" cannot "view" "prometheus.metrics.openshift.io" with name "" in project "logging"

I also noticed that Endpoint URLs are kind of weird too: https://90.49.0.11:9300_prometheus/metrics

Any hints on what might be wrong?

UPDATE:

I tried curl to the first endpoint (the "router" one) and i get similar error:

Forbidden: User "system:anonymous" cannot get routers/metrics.route.openshift.io at the cluster scope

Just found the defect for the ROUTER endpoint: https://github.com/openshift/origin/issues/17685

ptescher commented 6 years ago

The router stuff isn't set up correctly. It looks like it will be working in the next release. See https://github.com/openshift/origin/pull/19318

The logging elasticsearch one seems like it should be working, but there are some changes that would need to be made. First of all, the kubernetes-service-endpoints don't do auth. You would need to add something like bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token to that section of the prometheus config. After doing that on my 3.7 cluster I get this error: authorizer reason: User "system:serviceaccount:openshift-metrics:prometheus" cannot "view" "prometheus.metrics.openshift.io" with name "" in project "logging" which would indicate some sort of access control issue.

Reamer commented 6 years ago

@pat2man Any progress in your rbac problem? For me the logging metric endpoint works with "bearer_token_file" in prometheus.yml configuration file. My prometheus serviceaccount has the cluster role "cluster-readers".

But I don't know, how to solve a quite similar error for haproxy.

Unauthorized User "system:serviceaccount:openshift-metrics:prometheus" cannot get routers/metrics.route.openshift.io at the cluster scope

With my user cluster-admin token the haproxy metric endpoint works as expected. Also when I give the prometheus serviceuser the same rights.

Has anyone some ideas for me?

EDIT: My Workaround: Add to the ClusterRole "cluster-reader" an additional rule.

apiGroup: route.openshift.io
resource: routers/metrics
verbs: get

ptescher commented 6 years ago

@Reamer if you take a look at the latest prometheus example it has a prometheus-scraper cluster role:

https://github.com/openshift/origin/blob/master/examples/prometheus/prometheus.yaml

lukas-vlcek commented 6 years ago

The prometheus.io/path for Elasticsearch prometheus service is invalid. I opened a ticket for this https://github.com/openshift/openshift-ansible/issues/8343 and linked relevant PR.

However, there is another issue, the path /_prometheus/metrics SHOULD be used only in connection to port 9200. There is no point in making Prometheus scrape ports 9300 or 4443. How can we fix that?

Or, if the goal is to have Prometheus scrape ES nodes via the proxy then only 4443 port is needed and we can drop both 9200 and 9300. What do you think?

lukas-vlcek commented 6 years ago

To make it more clear, this is what I am talking about: screen shot 2018-05-11 at 15 27 41_edit If we make it work via proxy (4443) then we should stop scraping other ports (9200 and 9300).

I had a quick discussion about this with @jcantrill and he had an idea of creating a new extra Prometheus rules just for logging. He has a PR for this also https://github.com/openshift/origin/pull/18796/files I will try to push his approach further.

ptescher commented 6 years ago

@lukas-vlcek you can annotate the service with prometheus.io/port to restrict it to the correct port

lukas-vlcek commented 6 years ago

@pat2man right, there is PR to fix that: https://github.com/openshift/openshift-ansible/pull/8432

simonpasquier commented 6 years ago

FYI I've submitted #8512 to get the router's metrics back.

prasenforu commented 6 years ago

Can anyone explain how to fix in existing (running) 3.9 ? What are the steps need to take to fix this router metric issue.

simonpasquier commented 6 years ago

@prasenforu I think you can get around it by fixing a couple of permissions. At least it worked for me using oc cluster up.

First, create the following cluster role:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-scraper
rules:
- apiGroups:
  - route.openshift.io
  resources:
  - routers/metrics
  verbs:
  - get
- apiGroups:
  - image.openshift.io
  resources:
  - registry/metrics
  verbs:
  - get

Then assign this cluser role to the prometheus service account. Finally you need to add the system:auth-delegator cluster role to the router service account.

prasenforu commented 6 years ago

@simonpasquier

Let me try, Though I am not so familiar with RBAC but again checking last two execution, please verify and let me know.

oc adm policy add-cluster-role-to-user prometheus-scraper system:serviceaccount:openshift-metrics:prometheus

oc adm policy add-cluster-role-to-user system:auth-delegator system:serviceaccount:default:router

simonpasquier commented 6 years ago

@prasenforu I used:

oc adm policy add-cluster-role-to-user system:auth-delegator -z router -n default
oc adm policy add-cluster-role-to-user prometheus-scraper -z prometheus -n openshift-metrics

Not sure if it makes a difference.

prasenforu commented 6 years ago

No improvement,

I am not using single node cluster. I am using full cluster with following configuration 1 Master 2 Infra, 3 ETCD 3 Nodes

Is there any difference ? even no firewall protection.

Even from prometheus container also I am able to fetch metrics.

Prometheus Config:


    - job_name: 'kubernetes-service-endpoints'

      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        # TODO: this should be per target
        insecure_skip_verify: true

      kubernetes_sd_configs:
      - role: endpoints

      relabel_configs:
        # only scrape infrastructure components
        - source_labels: [__meta_kubernetes_namespace]
          action: keep
          regex: 'default|logging|metrics|kube-.+|openshift|openshift-.+'
        # drop infrastructure components managed by other scrape targets
        - source_labels: [__meta_kubernetes_service_name]
          action: drop
          regex: 'prometheus-node-exporter'
        # only those that have requested scraping
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: (.+)(?::\d+);(\d+)
          replacement: $1:$2
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_service_name]
          action: replace
          target_label: kubernetes_name

simonpasquier commented 6 years ago

@prasenforu right so I've checked again and IIUC what's missing is that the kubernetes-service-endpoints job doesn't provide any token to authenticate against the scraped endpoints.

Can you check whether this command succeeds after you've added the permissions?

oc rsh po/prometheus-0 sh -c 'curl -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" http://router.default.svc:1936/metrics'

If it does, I'd recommend adding another scrape configuration specifically for the router (loosely adapted from https://github.com/openshift/origin/pull/18254):

      - job_name: 'openshift-router'
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: endpoints
          namespaces:
            names:
            - default
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;router;1936-tcp

And modify the kubernetes-service-endpoints job to drop the router endpoints by adding the following rule to the relabel_configs section:

        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: drop
          regex: default;router;1936-tcp

prasenforu commented 6 years ago

@simonpasquier

I created new scape (haproxy)

- job_name: 'haproxyr'
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: endpoints
          namespaces:
            names:
            - default
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;router;1936-tcp

And modified the kubernetes-service-endpoints job to drop the router endpoints by adding the following rule to the relabel_configs section:

       - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: drop
          regex: default;router;1936-tcp

Its working.

But still red error in "kubernetes-service-endpoints" and as a result I am getting mail alert as alert is configured in prometheus.

simonpasquier commented 6 years ago

@prasenforu can you double-check the configuration of the kubernetes-service-endpoints job from the Prometheus UI (Status > Configuration page) and share it? It should drop the router targets unless I missed something.

prasenforu commented 6 years ago

Here is the configuration from Prometheus UI Console

- job_name: kubernetes-service-endpoints
  scrape_interval: 1m
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names: []
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    insecure_skip_verify: true
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: default|logging|metrics|kube-.+|openshift|openshift-.+
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: prometheus-node-exporter
    replacement: $1
    action: drop
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    separator: ;
    regex: "true"
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    separator: ;
    regex: (https?)
    target_label: __scheme__
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    separator: ;
    regex: (.+)
    target_label: __metrics_path__
    replacement: $1
    action: replace
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    separator: ;
    regex: (.+)(?::\d+);(\d+)
    target_label: __address__
    replacement: $1:$2
    action: replace
  - separator: ;
    regex: __meta_kubernetes_service_label_(.+)
    replacement: $1
    action: labelmap
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: kubernetes_namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: kubernetes_name
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: default;router;1936-tcp
    replacement: $1
    action: drop

simonpasquier commented 6 years ago

@prasenforu please try removing the __meta_kubernetes_endpoint_port_name portion like this:

       - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
          action: drop
          regex: default;router

prasenforu commented 6 years ago

@simonpasquier

Yes, now it works..

Thanks for your valuable continuous support :+1:

prasenforu commented 6 years ago

@simonpasquier

Coming back again on this issue. Looks like its not AUTO discovering services endpoints.

Recently I add RabbitMQ with attachment MQ-exporter. I can see all metrics are exposed but not visible in prometheus console.

After I added another scrape similar like haproxy (router) then all metrics are visible.

simonpasquier commented 6 years ago

@prasenforu

Looks like its not AUTO discovering services endpoints.

The kubernetes-service-endpoints job is only for the infrastructure services (see https://github.com/openshift/openshift-ansible/blob/release-3.9/roles/openshift_prometheus/templates/prometheus.yml.j2#L140) so yes, you'll have to add another scrape definition for user applications.

prasenforu commented 5 years ago

@simonpasquier

Hi Simon,

Coming back again! hope you are doing well.

We are trying to enable Openshift Registry Metric.

Every thing we have done in container side and I am able to get metric inside the docker-registry.

oc describe svc docker-registry

Name:              docker-registry
Namespace:         default
Labels:            docker-registry=default
Annotations:       prometheus.openshift.io/password=Passw0rd
                   prometheus.openshift.io/port=5000
                   prometheus.openshift.io/scrape=true
                   prometheus.openshift.io/username=prometheus
Selector:          docker-registry=default
Type:              ClusterIP
IP:                172.30.52.223
Port:              5000-tcp  5000/TCP
TargetPort:        5000/TCP
Endpoints:         10.130.0.99:5000
Session Affinity:  ClientIP
Events:            <none>

Curl command from docker registry container:

curl -k -s -u -asd:Passw0rd https://localhost:5000/extensions/v2/metrics

# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 4.0836e-05
go_gc_duration_seconds{quantile="0.25"} 6.7201e-05
go_gc_duration_seconds{quantile="0.5"} 8.126e-05
go_gc_duration_seconds{quantile="0.75"} 0.000141178
go_gc_duration_seconds{quantile="1"} 0.002147467
go_gc_duration_seconds_sum 0.015416893
go_gc_duration_seconds_count 91
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 16
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 5.723352e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 3.20598448e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.54622e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 2.185241e+06
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 1.6070902292235695e-05
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 655360
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 5.723352e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 4.317184e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 8.72448e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 31284
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 0
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 1.3041664e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.5426908271857438e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 7140
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2.216525e+06
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 3472
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 128592
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 180224
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.0923584e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 532748
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 589824
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 589824
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.6562424e+07
# HELP go_threads Number of OS threads created
# TYPE go_threads gauge
go_threads 8
# HELP http_request_duration_microseconds The HTTP request latencies in microseconds.
# TYPE http_request_duration_microseconds summary
http_request_duration_microseconds{handler="prometheus",quantile="0.5"} NaN
http_request_duration_microseconds{handler="prometheus",quantile="0.9"} NaN
http_request_duration_microseconds{handler="prometheus",quantile="0.99"} NaN
http_request_duration_microseconds_sum{handler="prometheus"} 1169.771
http_request_duration_microseconds_count{handler="prometheus"} 1
# HELP http_request_size_bytes The HTTP request sizes in bytes.
# TYPE http_request_size_bytes summary
http_request_size_bytes{handler="prometheus",quantile="0.5"} NaN
http_request_size_bytes{handler="prometheus",quantile="0.9"} NaN
http_request_size_bytes{handler="prometheus",quantile="0.99"} NaN
http_request_size_bytes_sum{handler="prometheus"} 112
http_request_size_bytes_count{handler="prometheus"} 1
# HELP http_requests_total Total number of HTTP requests made.
# TYPE http_requests_total counter
http_requests_total{code="200",handler="prometheus",method="get"} 1
# HELP http_response_size_bytes The HTTP response sizes in bytes.
# TYPE http_response_size_bytes summary
http_response_size_bytes{handler="prometheus",quantile="0.5"} NaN
http_response_size_bytes{handler="prometheus",quantile="0.9"} NaN
http_response_size_bytes{handler="prometheus",quantile="0.99"} NaN
http_response_size_bytes_sum{handler="prometheus"} 12526
http_response_size_bytes_count{handler="prometheus"} 1
# HELP imageregistry_build_info A metric with a constant '1' value labeled by major, minor, git commit & git version from which the image registry was built.
# TYPE imageregistry_build_info gauge
imageregistry_build_info{gitCommit="57e101f208cfc47ddadd97408c05970332fcd70e",gitVersion="v3.9.43",major="3",minor="9"} 1
# HELP openshift_registry_request_duration_seconds Request latency summary in microseconds for each operation
# TYPE openshift_registry_request_duration_seconds histogram
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.005"} 0
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.01"} 4
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.025"} 27
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.05"} 28
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.1"} 29
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.25"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.5"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="1"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="2.5"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="5"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="10"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="+Inf"} 30
openshift_registry_request_duration_seconds_sum{name="openshift/cakephp-ex",operation="manifestservice.get"} 0.537579068
openshift_registry_request_duration_seconds_count{name="openshift/cakephp-ex",operation="manifestservice.get"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.005"} 0
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.01"} 4
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.025"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.05"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.1"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.25"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="1"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="2.5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="10"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="+Inf"} 30
openshift_registry_request_duration_seconds_sum{name="quotasample/cakephp-ex",operation="manifestservice.get"} 0.3506592289999999
openshift_registry_request_duration_seconds_count{name="quotasample/cakephp-ex",operation="manifestservice.get"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.005"} 0
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.01"} 10
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.025"} 29
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.05"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.1"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.25"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="1"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="2.5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="10"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="+Inf"} 30
openshift_registry_request_duration_seconds_sum{name="quotasample/ruby-hello-world",operation="manifestservice.get"} 0.339042125
openshift_registry_request_duration_seconds_count{name="quotasample/ruby-hello-world",operation="manifestservice.get"} 30
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 13.77
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 8
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.9167616e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.54268309088e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.74546944e+08

But when I was trying setup as Job in Prometheus, facing authentication error.

- job_name: openshift-registry
  scrape_interval: 1m
  scrape_timeout: 10s
  metrics_path: /extensions/v2/metrics
  scheme: https
  kubernetes_sd_configs:
  - api_server: null
    role: endpoints
    namespaces:
      names:
      - default
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  tls_config:
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    server_name: docker-registry.default.svc
    insecure_skip_verify: true
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: default;docker-registry;5000-tcp
    replacement: $1
    action: keep

Error as follows ...

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 4 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 4 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/openshift-ansible/issues/7986#issuecomment-664835776): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

openshift / openshift-ansible

Prometheus - kubernetes-service-endpoints down #7986

Description