Closed k0ste closed 6 years ago
How I saw this:
This is the same drives but in another hardware - Dell PowerEdge R730xd. I.e. I have only one Dell R530 with this issue (in same time netdata dash show metrics properly). How to debug this?
When source=average
, netdata returns the average since the last time prometheus queried netdata.
You can see this if you append &help=yes
. The first line of the response will be:
# COMMENT netdata "boxe" to prometheus "10.11.13.127", source "average", last seen 110 seconds ago, time range 1514671313 to 1514671423
netdata identifies each prometheus server by its IP. If you have multiple prometheus servers querying the same netdata via the same IP (e.g. multiple prometheus in containers, or at a NATed zone), you should append &server=NAME
to the URL, so that netdata will be able to keep track of each of them individually, even if they reach netdata using the same IP. Actually, it would be great if you always append this to your prometheus - just in case...
Then, since netdata averages the data between prometheus queries there should be some discrepancies, though they do not justify the diff you see.
Do the above explain your observations?
One way to find out if you have multiple prometheus querying netdata via the same IP is to examine /var/log/netdata/access.log
. Grep the URL.
When source=average, netdata returns the average since the last time prometheus queried netdata. You can see this if you append &help=yes.
Thanks, this was helped to find human configuration error (matched by hostname in comment string). The error is: for two instances declared same ipaddr.
Actually, it would be great if you always append this to your prometheus - just in case...
Did this.
netdata version: 1.9.0 Release Prometheus query:
http://192.168.100.15:19999/api/v1/allmetrics?format=prometheus&source=average
netdata.conf:kernel: 3.10.0-693.11.1.el7 (CentOS Linux release 7.4.1708 (Core)) Hardware: Dell PowerEdge R530 HDD (sd[a-h]): Toshiba X300 4-6Tb nvme2n1: Plextor M8PEY 1Tb
Look at this:
And at same time on netdata dashboard:
Couple of drives have bullshit metrics. Error? Run fio on not rotational drive:
And series looks like this:
netdata_disk_util___of_time_working_average{chart="disk_util.nvme2n1",family="nvme2n1",dimension="utilization"} 4.7565070 1514652852000
It seems to me that somewhere is set divisor.
netdata startup log
```shell 2017-12-30 23:52:38: netdata INFO : Adjusted my Out-Of-Memory (OOM) score from 0 to 1000. 2017-12-30 23:52:38: netdata INFO : netdata started on pid 215243. 2017-12-30 23:52:38: netdata ERROR: Registry: cannot open registry file: '/var/lib/netdata/registry/registry.db' (errno 2, No such file or directory) 2017-12-30 23:52:38: netdata INFO : Host 'ceph-osd5' (at registry as 'ceph-osd5') with guid 'da1db16c-41c4-11e7-bc9b-3cfdfea55f28' initialized, os 'linux', timezone 'Asia/Novosibirsk', tags '', update every 1, memory mode save, history entries 3996, streaming disabled (to '' with api key ''), health disabled, cache_dir '/var/cache/netdata', varlib_dir '/var/lib/netdata', health_log '/var/lib/netdata/health/health-log.db', alarms default handler '/usr/libexec/netdata/plugins.d/alarm-notify.sh', alarms default recipient 'root' 2017-12-30 23:52:38: netdata INFO : PROC Plugin thread created with task id 215245 2017-12-30 23:52:38: netdata INFO : IDLEJITTER thread created with task id 215248 2017-12-30 23:52:38: netdata INFO : CGROUP plugin thread created with task id 215247 2017-12-30 23:52:38: netdata INFO : DISKSPACE thread created with task id 215246 2017-12-30 23:52:38: netdata INFO : BACKEND: thread created with task id 215249 2017-12-30 23:52:38: netdata INFO : HEALTH thread created with task id 215250 2017-12-30 23:52:38: netdata INFO : Multi-threaded WEB SERVER thread created with task id 215252 2017-12-30 23:52:38: netdata INFO : PLUGINS.D thread created with task id 215251 2017-12-30 23:52:38: netdata INFO : BACKEND: thread exiting 2017-12-30 23:52:38: netdata INFO : netdata initialization completed. Enjoy real-time performance monitoring! 2017-12-30 23:52:38: netdata INFO : Listening on 'tcp:0.0.0.0:19999' 2017-12-30 23:52:38: netdata INFO : Listening on 'tcp:[::]:19999' 2017-12-30 23:52:38: netdata INFO : STATSD main thread created with task id 215253 2017-12-30 23:52:38: netdata INFO : PLUGINSD: '/usr/libexec/netdata/plugins.d/apps.plugin' running on pid 215255 2017-12-30 23:52:38: netdata INFO : PLUGINSD: '/usr/libexec/netdata/plugins.d/charts.d.plugin' running on pid 215257 2017-12-30 23:52:38: netdata INFO : PLUGINSD: '/usr/libexec/netdata/plugins.d/python.d.plugin' running on pid 215259 2017-12-30 23:52:38: netdata ERROR: PLUGINSD: Cannot open plugins directory '/etc/netdata/custom-plugins.d'. (errno 2, No such file or directory) 2017-12-30 23:52:38: netdata ERROR: LISTENER: IPv6 bind() on ip '::1' port 8125, socktype 2 failed. (errno 99, Cannot assign requested address) 2017-12-30 23:52:38: netdata ERROR: LISTENER: Cannot bind to ip '::1', port 8125 2017-12-30 23:52:38: netdata ERROR: LISTENER: IPv6 bind() on ip '::1' port 8125, socktype 1 failed. (errno 99, Cannot assign requested address) 2017-12-30 23:52:38: netdata ERROR: LISTENER: Cannot bind to ip '::1', port 8125 2017-12-30 23:52:38: netdata INFO : LISTENER: Listen socket udp:127.0.0.1:8125 opened successfully. 2017-12-30 23:52:38: netdata INFO : LISTENER: Listen socket tcp:127.0.0.1:8125 opened successfully. 2017-12-30 23:52:38: netdata INFO : STATSD collector thread No 2 created with task id 215260 2017-12-30 23:52:38: netdata INFO : POLLFD: LISTENER: listening on 'udp:127.0.0.1:8125' 2017-12-30 23:52:38: netdata INFO : POLLFD: LISTENER: listening on 'tcp:127.0.0.1:8125' 2017-12-30 23:52:38: apps.plugin INFO : started on pid 215255 2017-12-30 23:52:38: charts.d: INFO: main: started from '/usr/libexec/netdata/plugins.d/charts.d.plugin' with options: 1 2017-12-30 23:52:38: charts.d: INFO: apache: is disabled. Add a line with apache=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: cpu_apps: is disabled. Add a line with cpu_apps=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: cpufreq: is disabled. Add a line with cpufreq=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: example: is disabled. Add a line with example=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: exim: is disabled. Add a line with exim=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: hddtemp: is disabled. Add a line with hddtemp=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: load_average: is disabled. Add a line with load_average=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: mem_apps: is disabled. Add a line with mem_apps=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: mysql: is disabled. Add a line with mysql=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: nginx: is disabled. Add a line with nginx=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: phpfpm: is disabled. Add a line with phpfpm=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: postfix: is disabled. Add a line with postfix=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: sensors: is disabled. Add a line with sensors=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: squid: is disabled. Add a line with squid=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:38: charts.d: INFO: tomcat: is disabled. Add a line with tomcat=force in /etc/netdata/charts.d.conf to enable it (or remove the line that disables it). 2017-12-30 23:52:39: charts.d: WARNING: ap: command 'iw' is not found in /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin. 2017-12-30 23:52:39: charts.d: ERROR: ap: module's 'ap' check() function reports failure. 2017-12-30 23:52:39: charts.d: WARNING: apcupsd: command 'apcaccess' is not found in /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin. 2017-12-30 23:52:39: charts.d: ERROR: apcupsd: module's 'apcupsd' check() function reports failure. 2017-12-30 23:52:39: charts.d: WARNING: nut: command 'upsc' is not found in /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin. 2017-12-30 23:52:39: charts.d: ERROR: nut: module's 'nut' check() function reports failure. 2017-12-30 23:52:39: python.d INFO: plugin: main: Using python 2 2017-12-30 23:52:39: charts.d: WARNING: opensips: command 'opensipsctl' is not found in /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/sbin:/usr/sbin:/usr/local/bin:/usr/local/sbin. 2017-12-30 23:52:39: charts.d: ERROR: opensips: module's 'opensips' check() function reports failure. 2017-12-30 23:52:39: charts.d: FATAL: main: No charts to collect data from. 2017-12-30 23:52:39: netdata INFO : PLUGINSD: '/usr/libexec/netdata/plugins.d/charts.d.plugin' called DISABLE. Disabling it. 2017-12-30 23:52:39: netdata ERROR: PLUGINSD: plugin '/usr/libexec/netdata/plugins.d/charts.d.plugin' disconnected. 2017-12-30 23:52:39: netdata INFO : PLUGINSD: '/usr/libexec/netdata/plugins.d/charts.d.plugin' on pid 215257 stopped after 0 successful data collections (ENDs). 2017-12-30 23:52:39: netdata ERROR: PLUGINSD: '/usr/libexec/netdata/plugins.d/charts.d.plugin' (pid 215257) does not generate useful output but it reports success (exits with 0). Will not start it again - it is disabled.. (errno 9, Bad file descriptor) 2017-12-30 23:52:39: python.d ERROR: plugin: main: module load config: 'cpuidle' => [FAILED] 2017-12-30 23:52:39: python.d ERROR: plugin: main: load config error : [Errno 2] No such file or directory: '/etc/netdata/python.d/cpuidle.conf' 2017-12-30 23:52:39: python.d ERROR: apache: localhost: Url: http://localhost/server-status?auto. Error: HTTPConnectionPool(host='localhost', port=80): Max retries exceeded with url: /server-status?auto (Caused by ProtocolError('Connection aborted.', error(113, 'No route to host'))) 2017-12-30 23:52:39: python.d ERROR: apache: localhost: check() => [FAILED] 2017-12-30 23:52:39: python.d ERROR: apache: localipv4: Url: http://127.0.0.1/server-status?auto. Error: HTTPConnectionPool(host='127.0.0.1', port=80): Max retries exceeded with url: /server-status?auto (Caused by ProtocolError('Connection aborted.', error(111, 'Connection refused'))) 2017-12-30 23:52:39: python.d ERROR: apache: localipv4: check() => [FAILED] 2017-12-30 23:52:39: python.d ERROR: apache: localipv6: Url: http://::1/server-status?auto. Error: Failed to parse: ::1 2017-12-30 23:52:39: python.d ERROR: apache: localipv6: check() => [FAILED] 2017-12-30 23:52:39: python.d ERROR: beanstalk: beanstalk: 'beanstalkc' module is needed to use beanstalk.chart.py 2017-12-30 23:52:39: python.d ERROR: beanstalk: beanstalk: check() => [FAILED] 2017-12-30 23:52:39: python.d ERROR: bind_rndc: bind_rndc: Can't locate "rndc" binary or binary is not executable by netdata 2017-12-30 23:52:39: python.d ERROR: bind_rndc: bind_rndc: check() => [FAILED] 2017-12-30 23:52:39: python.d ERROR: couchdb: localhost: Url: http://127.0.0.1:5984/_node/couchdb@127.0.0.1/_stats. Error: HTTPConnectionPool(host='127.0.0.1', port=5984): Max retries exceeded with url: /_node/couchdb@127.0.0.1/_stats (Caused by ProtocolError('Connection aborted.', error(111, 'Connection refused'))) 2017-12-30 23:52:39: python.d ERROR: couchdb: localhost: Url: http://127.0.0.1:5984/_active_tasks. Error: HTTPConnectionPool(host='127.0.0.1', port=5984): Max retries exceeded with url: /_active_tasks (Caused by ProtocolError('Connection aborted.', error(111, 'Connection refused'))) 2017-12-30 23:52:39: python.d ERROR: couchdb: localhost: Url: http://127.0.0.1:5984/_node/couchdb@127.0.0.1/_system. Error: HTTPConnectionPool(host='127.0.0.1', port=5984): Max retries exceeded with url: /_node/couchdb@127.0.0.1/_system (Caused by ProtocolError('Connection aborted.', error(111, 'Connection refused'))) 2017-12-30 23:52:39: python.d ERROR: couchdb: localhost: _get_data() returned no data or type is not