Open peachisai opened 2 months ago
Pinging code owners for receiver/prometheus: @Aneurysm9 @dashpole. See Adding Labels via Comments if you do not have permissions to add labels yourself.
Do you see anything in the logs?
Can you enable debug logging, and let us know if there are any scrape failures, etc?
can you share the full scrape response for that metric?
Can you look at the up
, and scrape_*
metrics to see if any targets are failing to be scraped, or any metrics are being dropped by the receiver?
Do you see anything in the logs?
Can you enable debug logging, and let us know if there are any scrape failures, etc?
can you share the full scrape response for that metric?
Can you look at the
up
, andscrape_*
metrics to see if any targets are failing to be scraped, or any metrics are being dropped by the receiver?
Hi, Thank you for the reply. I use
exporters:
debug:
verbosity: detailed
These are some parts of my log. I didn't find some errors or failures, and I can't found the missed target names.
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
NumberDataPoints #2
Data point attributes:
-> action: Str(end of minor GC)
-> cause: Str(Allocation Failure)
-> cluster: Str(nacos-cluster)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
Metric #37
Descriptor:
-> Name: executor_pool_max_threads
-> Description: The maximum allowed number of threads in the pool
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> name: Str(applicationTaskExecutor)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 2147483647.000000
NumberDataPoints #1
Data point attributes:
-> cluster: Str(nacos-cluster)
-> name: Str(taskScheduler)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 2147483647.000000
Metric #38
Descriptor:
-> Name: nacos_naming_subscriber
-> Description:
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> node: Str(127.0.0.1:8848)
-> version: Str(v1)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
NumberDataPoints #1
Data point attributes:
-> cluster: Str(nacos-cluster)
-> node: Str(127.0.0.1:8848)
-> version: Str(v2)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
Metric #39
Descriptor:
-> Name: jvm_classes_loaded_classes
-> Description: The number of classes that are currently loaded in the Java virtual machine
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 14983.000000
Metric #40
Descriptor:
-> Name: tomcat_sessions_created_sessions_total
-> Description:
-> Unit:
-> DataType: Sum
-> IsMonotonic: true
-> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 2024-08-20 06:42:13.391 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
Metric #41
Descriptor:
-> Name: tomcat_sessions_alive_max_seconds
-> Description:
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
Metric #42
Descriptor:
-> Name: nacos_naming_publisher
-> Description:
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> node: Str(127.0.0.1:8848)
-> version: Str(v1)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
NumberDataPoints #1
Data point attributes:
-> cluster: Str(nacos-cluster)
-> node: Str(127.0.0.1:8848)
-> version: Str(v2)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
Metric #43
Descriptor:
-> Name: jvm_gc_memory_allocated_bytes_total
-> Description: Incremented for an increase in the size of the (young) heap memory pool after one GC to before the next
-> Unit:
-> DataType: Sum
-> IsMonotonic: true
-> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 2024-08-20 06:42:13.391 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 31471073024.000000
Metric #44
Descriptor:
-> Name: executor_completed_tasks_total
-> Description: The approximate total number of tasks that have completed execution
-> Unit:
-> DataType: Sum
-> IsMonotonic: true
-> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> name: Str(applicationTaskExecutor)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 2024-08-20 06:42:13.391 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
NumberDataPoints #1
Data point attributes:
-> cluster: Str(nacos-cluster)
-> name: Str(taskScheduler)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 2024-08-20 06:42:13.391 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 181528.000000
Metric #45
Descriptor:
-> Name: nacos_timer_seconds
-> Description:
-> Unit:
-> DataType: Summary
SummaryDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> module: Str(config)
-> name: Str(writeConfigRpcRt)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 2024-08-20 06:42:13.391 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Count: 2
Sum: 0.114000
Metric #46
Descriptor:
-> Name: jdbc_connections_min
-> Description: Minimum number of idle connections in the pool.
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> name: Str(dataSource)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: -1.000000
Metric #47
Descriptor:
-> Name: http_server_requests_seconds_max
-> Description:
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> exception: Str(None)
-> method: Str(GET)
-> node: Str(127.0.0.1:8848)
-> outcome: Str(SUCCESS)
-> status: Str(200)
-> uri: Str(/v2/core/cluster/node/list)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
NumberDataPoints #1
Data point attributes:
-> cluster: Str(nacos-cluster)
-> exception: Str(None)
-> method: Str(GET)
-> node: Str(127.0.0.1:8848)
-> outcome: Str(SUCCESS)
-> status: Str(200)
-> uri: Str(/actuator/prometheus)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.003789
NumberDataPoints #2
Data point attributes:
-> cluster: Str(nacos-cluster)
-> exception: Str(None)
-> method: Str(GET)
-> node: Str(127.0.0.1:8848)
-> outcome: Str(SUCCESS)
-> status: Str(200)
-> uri: Str(/v1/console/namespaces)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
NumberDataPoints #3
Data point attributes:
-> cluster: Str(nacos-cluster)
-> exception: Str(None)
-> method: Str(GET)
-> node: Str(127.0.0.1:8848)
-> outcome: Str(SERVER_ERROR)
-> status: Str(501)
-> uri: Str(root)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: 0.000000
Metric #48
Descriptor:
-> Name: jdbc_connections_max
-> Description: Maximum number of active connections that can be allocated at the same time.
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> name: Str(dataSource)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-20 06:42:13.391 +0000 UTC
Value: -1.000000
Metric #49
Descriptor:
-> Name: executor_queued_tasks
-> Description: The approximate number of tasks that are queued for execution
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
@dashpole Hi, I found this issue was assigned. If any detail should I provide, please ping me.
Were you able to check this?
Can you look at the up, and scrape_* metrics to see if any targets are failing to be scraped, or any metrics are being dropped by the receiver?
Were you able to check this?
Can you look at the up, and scrape_* metrics to see if any targets are failing to be scraped, or any metrics are being dropped by the receiver?
Hi, I did not find some error. did you mean config the receivers to get the scrape log? sorry I don't know how to do it, could you give me some advice? This is my receiver config
receivers:
prometheus:
config:
scrape_configs:
- job_name: "nacos-monitoring"
scrape_interval: 30s
metrics_path: "/nacos/actuator/prometheus"
static_configs:
- targets: ['127.0.0.1:8848']
relabel_configs:
- source_labels: [ ]
target_label: cluster
replacement: nacos-cluster
- source_labels: [ __address__ ]
regex: (.+)
target_label: node
replacement: $$1
You should get additional metrics with names "up", and "scrape_seriesadded", and a few other scrape. metrics. The scrape. metrics let you know if any metrics were dropped or rejected by Prometheus
failing
Hi,I filter the metrics up
and scrape_*
, still found nothing
Descriptor:
-> Name: up
-> Description: The scraping was successful
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Metric #4
Descriptor:
-> Name: scrape_series_added
-> Description: The approximate number of new series in this scrape
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Descriptor:
-> Name: scrape_samples_post_metric_relabeling
-> Description: The number of samples remaining after metric relabeling was applied
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Metric #1
Descriptor:
-> Name: scrape_duration_seconds
-> Description: Duration of the scrape
-> Unit: s
-> DataType: Gauge
NumberDataPoints #0
Right, you will need to look at the values of those metrics to see if any are being dropped, or if the target is down. Otherwise, if you can provide the full output of the prometheus endpoint (e.g. using curl), we can try to reproduce.
Right, you will need to look at the values of those metrics to see if any are being dropped, or if the target is down. Otherwise, if you can provide the full output of the prometheus endpoint (e.g. using curl), we can try to reproduce.
I browsed the log detailly but still found nothing contains error or drop. May I send you an email with my remote peer endpoint ?
I browsed the log detailly but still anything contains error or drop. May I send you an email with my remote peer endpoint ?
No, sorry. Please don't email me links. I also don't actually need your logs--I need the metrics scrape response.
I browsed the log detailly but still anything contains error or drop. May I send you an email with my remote peer endpoint ?
No, sorry. Please don't email me links. I also don't actually need your logs--I need the metrics scrape response.
Hi, I found nothing drop or error in the metrics scrape response. But it overlooked some certain segments
nacos_monitor{module="naming",name="mysqlHealthCheck",} 0.0
nacos_monitor{module="naming",name="emptyPush",} 0.0
nacos_monitor{module="config",name="configCount",} 2.0
nacos_monitor_count{module="core",name="raft_read_from_leader",} 0.0
nacos_monitor_sum{module="core",name="raft_read_from_leader",} 0.0
nacos_monitor{module="naming",name="tcpHealthCheck",} 0.0
nacos_monitor{module="naming",name="serviceChangedEventQueueSize",} 0.0
nacos_monitor{module="core",name="longConnection",} 0.0
nacos_monitor{module="naming",name="totalPush",} 0.0
nacos_monitor{module="naming",name="serviceSubscribedEventQueueSize",} 0.0
nacos_monitor{module="naming",name="serviceCount",} 0.0
nacos_monitor{module="naming",name="httpHealthCheck",} 0.0
nacos_monitor{module="naming",name="maxPushCost",} -1.0
nacos_monitor{module="config",name="longPolling",} 0.0
nacos_monitor{module="naming",name="failedPush",} 0.0
nacos_monitor{module="naming",name="leaderStatus",} 0.0
nacos_monitor{module="config",name="publish",} 0.0
nacos_monitor{module="config",name="dumpTask",} 0.0
nacos_monitor_count{module="core",name="raft_read_index_failed",} 0.0
nacos_monitor_sum{module="core",name="raft_read_index_failed",} 0.0
nacos_monitor{module="config",name="notifyTask",} 0.0
nacos_monitor{module="config",name="fuzzySearch",} 0.0
nacos_monitor{module="naming",name="avgPushCost",} -1.0
nacos_monitor{module="config",name="getConfig",} 0.0
nacos_monitor{module="naming",name="totalPushCountForAvg",} 0.0
nacos_monitor{module="naming",name="subscriberCount",} 0.0
nacos_monitor{module="naming",name="ipCount",} 0.0
nacos_monitor{module="config",name="notifyClientTask",} 0.0
nacos_monitor{module="naming",name="totalPushCostForAvg",} 0.0
nacos_monitor{module="naming",name="pushPendingTaskCount",} 0.0
# HELP nacos_monitor_max
Above nacos_monitor_sum{module="core",name="raft_read_index_failed",} 0.0
cannot be scraped
The rest metrics below it can be scraped
There are the scrape log
Descriptor:
-> Name: disk_total_bytes
-> Description: Total space for path
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> node: Str(127.0.0.1:8848)
-> path: Str(D:\ideaprojects\github\nacos\.)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-31 14:08:35.452 +0000 UTC
Value: 296022437888.000000
Metric #69
Descriptor:
-> Name: nacos_monitor
-> Description:
-> Unit:
-> DataType: Gauge
NumberDataPoints #0
Data point attributes:
-> cluster: Str(nacos-cluster)
-> module: Str(core)
-> name: Str(raft_read_index_failed)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-31 14:08:35.452 +0000 UTC
Value: 0.000000
NumberDataPoints #1
Data point attributes:
-> cluster: Str(nacos-cluster)
-> module: Str(config)
-> name: Str(notifyTask)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-31 14:08:35.452 +0000 UTC
Value: 0.000000
NumberDataPoints #2
Data point attributes:
-> cluster: Str(nacos-cluster)
-> module: Str(config)
-> name: Str(fuzzySearch)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-31 14:08:35.452 +0000 UTC
Value: 0.000000
NumberDataPoints #3
Data point attributes:
-> cluster: Str(nacos-cluster)
-> module: Str(naming)
-> name: Str(avgPushCost)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-31 14:08:35.452 +0000 UTC
Value: -1.000000
NumberDataPoints #4
Data point attributes:
-> cluster: Str(nacos-cluster)
-> module: Str(config)
-> name: Str(getConfig)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-31 14:08:35.452 +0000 UTC
Value: 0.000000
NumberDataPoints #5
Data point attributes:
-> cluster: Str(nacos-cluster)
-> module: Str(naming)
-> name: Str(totalPushCountForAvg)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-31 14:08:35.452 +0000 UTC
Value: 0.000000
NumberDataPoints #6
Data point attributes:
-> cluster: Str(nacos-cluster)
-> module: Str(naming)
-> name: Str(subscriberCount)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-31 14:08:35.452 +0000 UTC
Value: 0.000000
NumberDataPoints #7
Data point attributes:
-> cluster: Str(nacos-cluster)
-> module: Str(naming)
-> name: Str(ipCount)
-> node: Str(127.0.0.1:8848)
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-08-31 14:08:35.452 +0000 UTC
Value: 0.000000
I will have a try to debug the code
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Component(s)
cmd/otelcontribcol
What happened?
Description
When I use
prometheus receiver
to grab metrics, I foundotel
miss someone, but it could grab other mertics which have the similar structure.Steps to Reproduce
Expected Result
orginal data
Actual Result
Only get
ipCount
Collector version
v0.107.0
Environment information
Environment
OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")
OpenTelemetry Collector configuration
No response
Log output
No response
Additional context
No response