Closed pgeorgiev333 closed 4 years ago
The router stuff isn't set up correctly. It looks like it will be working in the next release. See https://github.com/openshift/origin/pull/19318
The logging elasticsearch one seems like it should be working, but there are some changes that would need to be made. First of all, the kubernetes-service-endpoints don't do auth. You would need to add something like bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
to that section of the prometheus config. After doing that on my 3.7 cluster I get this error: authorizer reason: User "system:serviceaccount:openshift-metrics:prometheus" cannot "view" "prometheus.metrics.openshift.io" with name "" in project "logging"
which would indicate some sort of access control issue.
@pat2man Any progress in your rbac problem? For me the logging metric endpoint works with "bearer_token_file" in prometheus.yml configuration file. My prometheus serviceaccount has the cluster role "cluster-readers".
But I don't know, how to solve a quite similar error for haproxy.
Unauthorized User "system:serviceaccount:openshift-metrics:prometheus" cannot get routers/metrics.route.openshift.io at the cluster scope
With my user cluster-admin token the haproxy metric endpoint works as expected. Also when I give the prometheus serviceuser the same rights.
Has anyone some ideas for me?
EDIT: My Workaround: Add to the ClusterRole "cluster-reader" an additional rule.
apiGroup: route.openshift.io
resource: routers/metrics
verbs: get
@Reamer if you take a look at the latest prometheus example it has a prometheus-scraper cluster role:
https://github.com/openshift/origin/blob/master/examples/prometheus/prometheus.yaml
The prometheus.io/path
for Elasticsearch prometheus service is invalid. I opened a ticket for this https://github.com/openshift/openshift-ansible/issues/8343 and linked relevant PR.
However, there is another issue, the path /_prometheus/metrics
SHOULD be used only in connection to port 9200
. There is no point in making Prometheus scrape ports 9300
or 4443
. How can we fix that?
Or, if the goal is to have Prometheus scrape ES nodes via the proxy then only 4443
port is needed and we can drop both 9200
and 9300
. What do you think?
To make it more clear, this is what I am talking about: If we make it work via proxy (4443) then we should stop scraping other ports (9200 and 9300).
I had a quick discussion about this with @jcantrill and he had an idea of creating a new extra Prometheus rules just for logging. He has a PR for this also https://github.com/openshift/origin/pull/18796/files I will try to push his approach further.
@lukas-vlcek you can annotate the service with prometheus.io/port to restrict it to the correct port
@pat2man right, there is PR to fix that: https://github.com/openshift/openshift-ansible/pull/8432
FYI I've submitted #8512 to get the router's metrics back.
Can anyone explain how to fix in existing (running) 3.9 ? What are the steps need to take to fix this router metric issue.
@prasenforu I think you can get around it by fixing a couple of permissions. At least it worked for me using oc cluster up
.
First, create the following cluster role:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-scraper
rules:
- apiGroups:
- route.openshift.io
resources:
- routers/metrics
verbs:
- get
- apiGroups:
- image.openshift.io
resources:
- registry/metrics
verbs:
- get
Then assign this cluser role to the prometheus
service account.
Finally you need to add the system:auth-delegator
cluster role to the router
service account.
@simonpasquier
Let me try, Though I am not so familiar with RBAC but again checking last two execution, please verify and let me know.
oc adm policy add-cluster-role-to-user prometheus-scraper system:serviceaccount:openshift-metrics:prometheus
oc adm policy add-cluster-role-to-user system:auth-delegator system:serviceaccount:default:router
@prasenforu I used:
oc adm policy add-cluster-role-to-user system:auth-delegator -z router -n default
oc adm policy add-cluster-role-to-user prometheus-scraper -z prometheus -n openshift-metrics
Not sure if it makes a difference.
No improvement,
I am not using single node cluster. I am using full cluster with following configuration 1 Master 2 Infra, 3 ETCD 3 Nodes
Is there any difference ? even no firewall protection.
Even from prometheus container also I am able to fetch metrics.
Prometheus Config:
- job_name: 'kubernetes-service-endpoints'
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# TODO: this should be per target
insecure_skip_verify: true
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
# only scrape infrastructure components
- source_labels: [__meta_kubernetes_namespace]
action: keep
regex: 'default|logging|metrics|kube-.+|openshift|openshift-.+'
# drop infrastructure components managed by other scrape targets
- source_labels: [__meta_kubernetes_service_name]
action: drop
regex: 'prometheus-node-exporter'
# only those that have requested scraping
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
@prasenforu right so I've checked again and IIUC what's missing is that the kubernetes-service-endpoints
job doesn't provide any token to authenticate against the scraped endpoints.
Can you check whether this command succeeds after you've added the permissions?
oc rsh po/prometheus-0 sh -c 'curl -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" http://router.default.svc:1936/metrics'
If it does, I'd recommend adding another scrape configuration specifically for the router (loosely adapted from https://github.com/openshift/origin/pull/18254):
- job_name: 'openshift-router'
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- default
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;router;1936-tcp
And modify the kubernetes-service-endpoints
job to drop the router endpoints by adding the following rule to the relabel_configs
section:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: drop
regex: default;router;1936-tcp
@simonpasquier
I created new scape (haproxy)
- job_name: 'haproxyr'
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- default
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;router;1936-tcp
And modified the kubernetes-service-endpoints job to drop the router endpoints by adding the following rule to the relabel_configs section:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: drop
regex: default;router;1936-tcp
Its working.
But still red error in "kubernetes-service-endpoints" and as a result I am getting mail alert as alert is configured in prometheus.
@prasenforu can you double-check the configuration of the kubernetes-service-endpoints
job from the Prometheus UI (Status > Configuration page) and share it? It should drop the router targets unless I missed something.
Here is the configuration from Prometheus UI Console
- job_name: kubernetes-service-endpoints
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- api_server: null
role: endpoints
namespaces:
names: []
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: default|logging|metrics|kube-.+|openshift|openshift-.+
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: prometheus-node-exporter
replacement: $1
action: drop
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
separator: ;
regex: "true"
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
separator: ;
regex: (https?)
target_label: __scheme__
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: $1
action: replace
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
separator: ;
regex: (.+)(?::\d+);(\d+)
target_label: __address__
replacement: $1:$2
action: replace
- separator: ;
regex: __meta_kubernetes_service_label_(.+)
replacement: $1
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: kubernetes_namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: kubernetes_name
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
separator: ;
regex: default;router;1936-tcp
replacement: $1
action: drop
@prasenforu please try removing the __meta_kubernetes_endpoint_port_name
portion like this:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
action: drop
regex: default;router
@simonpasquier
Yes, now it works..
Thanks for your valuable continuous support :+1:
@simonpasquier
Coming back again on this issue. Looks like its not AUTO discovering services endpoints.
Recently I add RabbitMQ with attachment MQ-exporter. I can see all metrics are exposed but not visible in prometheus console.
After I added another scrape similar like haproxy (router) then all metrics are visible.
@prasenforu
Looks like its not AUTO discovering services endpoints.
The kubernetes-service-endpoints
job is only for the infrastructure services (see https://github.com/openshift/openshift-ansible/blob/release-3.9/roles/openshift_prometheus/templates/prometheus.yml.j2#L140) so yes, you'll have to add another scrape definition for user applications.
@simonpasquier
Hi Simon,
Coming back again! hope you are doing well.
We are trying to enable Openshift Registry Metric.
Every thing we have done in container side and I am able to get metric inside the docker-registry.
oc describe svc docker-registry
Name: docker-registry
Namespace: default
Labels: docker-registry=default
Annotations: prometheus.openshift.io/password=Passw0rd
prometheus.openshift.io/port=5000
prometheus.openshift.io/scrape=true
prometheus.openshift.io/username=prometheus
Selector: docker-registry=default
Type: ClusterIP
IP: 172.30.52.223
Port: 5000-tcp 5000/TCP
TargetPort: 5000/TCP
Endpoints: 10.130.0.99:5000
Session Affinity: ClientIP
Events: <none>
Curl command from docker registry container:
curl -k -s -u -asd:Passw0rd https://localhost:5000/extensions/v2/metrics
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 4.0836e-05
go_gc_duration_seconds{quantile="0.25"} 6.7201e-05
go_gc_duration_seconds{quantile="0.5"} 8.126e-05
go_gc_duration_seconds{quantile="0.75"} 0.000141178
go_gc_duration_seconds{quantile="1"} 0.002147467
go_gc_duration_seconds_sum 0.015416893
go_gc_duration_seconds_count 91
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 16
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 5.723352e+06
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 3.20598448e+08
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.54622e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 2.185241e+06
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 1.6070902292235695e-05
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 655360
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 5.723352e+06
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 4.317184e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 8.72448e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 31284
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 0
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 1.3041664e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.5426908271857438e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 7140
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 2.216525e+06
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 3472
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 128592
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 180224
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 1.0923584e+07
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 532748
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 589824
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 589824
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 1.6562424e+07
# HELP go_threads Number of OS threads created
# TYPE go_threads gauge
go_threads 8
# HELP http_request_duration_microseconds The HTTP request latencies in microseconds.
# TYPE http_request_duration_microseconds summary
http_request_duration_microseconds{handler="prometheus",quantile="0.5"} NaN
http_request_duration_microseconds{handler="prometheus",quantile="0.9"} NaN
http_request_duration_microseconds{handler="prometheus",quantile="0.99"} NaN
http_request_duration_microseconds_sum{handler="prometheus"} 1169.771
http_request_duration_microseconds_count{handler="prometheus"} 1
# HELP http_request_size_bytes The HTTP request sizes in bytes.
# TYPE http_request_size_bytes summary
http_request_size_bytes{handler="prometheus",quantile="0.5"} NaN
http_request_size_bytes{handler="prometheus",quantile="0.9"} NaN
http_request_size_bytes{handler="prometheus",quantile="0.99"} NaN
http_request_size_bytes_sum{handler="prometheus"} 112
http_request_size_bytes_count{handler="prometheus"} 1
# HELP http_requests_total Total number of HTTP requests made.
# TYPE http_requests_total counter
http_requests_total{code="200",handler="prometheus",method="get"} 1
# HELP http_response_size_bytes The HTTP response sizes in bytes.
# TYPE http_response_size_bytes summary
http_response_size_bytes{handler="prometheus",quantile="0.5"} NaN
http_response_size_bytes{handler="prometheus",quantile="0.9"} NaN
http_response_size_bytes{handler="prometheus",quantile="0.99"} NaN
http_response_size_bytes_sum{handler="prometheus"} 12526
http_response_size_bytes_count{handler="prometheus"} 1
# HELP imageregistry_build_info A metric with a constant '1' value labeled by major, minor, git commit & git version from which the image registry was built.
# TYPE imageregistry_build_info gauge
imageregistry_build_info{gitCommit="57e101f208cfc47ddadd97408c05970332fcd70e",gitVersion="v3.9.43",major="3",minor="9"} 1
# HELP openshift_registry_request_duration_seconds Request latency summary in microseconds for each operation
# TYPE openshift_registry_request_duration_seconds histogram
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.005"} 0
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.01"} 4
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.025"} 27
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.05"} 28
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.1"} 29
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.25"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="0.5"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="1"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="2.5"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="5"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="10"} 30
openshift_registry_request_duration_seconds_bucket{name="openshift/cakephp-ex",operation="manifestservice.get",le="+Inf"} 30
openshift_registry_request_duration_seconds_sum{name="openshift/cakephp-ex",operation="manifestservice.get"} 0.537579068
openshift_registry_request_duration_seconds_count{name="openshift/cakephp-ex",operation="manifestservice.get"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.005"} 0
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.01"} 4
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.025"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.05"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.1"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.25"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="0.5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="1"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="2.5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="10"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/cakephp-ex",operation="manifestservice.get",le="+Inf"} 30
openshift_registry_request_duration_seconds_sum{name="quotasample/cakephp-ex",operation="manifestservice.get"} 0.3506592289999999
openshift_registry_request_duration_seconds_count{name="quotasample/cakephp-ex",operation="manifestservice.get"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.005"} 0
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.01"} 10
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.025"} 29
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.05"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.1"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.25"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="0.5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="1"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="2.5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="5"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="10"} 30
openshift_registry_request_duration_seconds_bucket{name="quotasample/ruby-hello-world",operation="manifestservice.get",le="+Inf"} 30
openshift_registry_request_duration_seconds_sum{name="quotasample/ruby-hello-world",operation="manifestservice.get"} 0.339042125
openshift_registry_request_duration_seconds_count{name="quotasample/ruby-hello-world",operation="manifestservice.get"} 30
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 13.77
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 8
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.9167616e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.54268309088e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.74546944e+08
But when I was trying setup as Job in Prometheus, facing authentication error.
- job_name: openshift-registry
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /extensions/v2/metrics
scheme: https
kubernetes_sd_configs:
- api_server: null
role: endpoints
namespaces:
names:
- default
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server_name: docker-registry.default.svc
insecure_skip_verify: true
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
separator: ;
regex: default;docker-registry;5000-tcp
replacement: $1
action: keep
Error as follows ...
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
Description
I have installed openshift 3.9 using the playbooks (deploy_cluster.yml) with the test repo RPMs. I have also enabled the metrics and prometheus. After the installation is done i looked at the prometheus "targets" page and most of the "kubernetes-service-endpoints" are down.
I looked particularly at the kubernetes_name="logging-es-prometheus" endpoint and i can see the following messages in the proxy container inside the logging-es-data-master-xxxx pod:
I also noticed that Endpoint URLs are kind of weird too: https://90.49.0.11:9300_prometheus/metrics
Any hints on what might be wrong?
UPDATE:
I tried curl to the first endpoint (the "router" one) and i get similar error:
Just found the defect for the ROUTER endpoint: https://github.com/openshift/origin/issues/17685