Closed eric-b-hymowitz closed 5 years ago
Hi @eric-b-hymowitz,
Your configuration looks fine.
First, please check in the cougar netdata error.log
for any errors related to streaming. The main question is whether cougar can actually establish an http connection to rolls-royce, which you can test with a simple curl request (you could just request the URL http://rolls-royce:19999/), or try to bring up the rolls-royce web UI itself from within cougar. The problem could be in hostname resolution, routing or a firewall.
If the connection is established properly and you get an access denied error, then you will want to look at the master's access lists Specifically, you will want to check the following two settings:
[web]
# allow connections from = ...
# allow streaming from = ...
Thanks for following up.
I can definitely connect on the port. My cougar error.log
seems normal:
2019-02-11 19:11:28: netdata INFO : MAIN : Host 'cougar' (at registry as 'cougar') with guid '8365354a-2e10-11e9-ab26-000c29120d5e' initialized, os 'linux', timezone 'GMT', tags '', program_name 'netdata', program_version 'v1.12.0-17-g30f7324', update every 1, memory mode none, history entries 3996, streaming enabled (to 'rolls-royce:19999' with api key '9447dae1-0830-4edd-9e70-1cd125844b65'), health disabled, cache_dir '/opt/netdata/var/cache/netdata', varlib_dir '/opt/netdata/var/lib/netdata', health_log '/opt/netdata/var/lib/netdata/health/health-log.db', alarms default handler '/opt/netdata/usr/libexec/netdata/plugins.d/alarm-notify.sh', alarms default recipient 'root'
2019-02-11 19:11:29: netdata ERROR : PLUGIN[tc] : STREAM cougar [send]: not ready - discarding collected metrics.
2019-02-11 19:11:29: netdata INFO : STREAM_SENDER[cougar] : thread created with task id 7445
2019-02-11 19:11:29: netdata INFO : STREAM_SENDER[cougar] : STREAM cougar [send]: thread created (task id 7445)
2019-02-11 19:11:34: netdata INFO : STREAM_SENDER[cougar] : STREAM cougar [send to rolls-royce:19999]: connecting...
2019-02-11 19:11:34: netdata INFO : STREAM_SENDER[cougar] : STREAM cougar [send to rolls-royce:19999]: initializing communication...
2019-02-11 19:11:34: netdata INFO : STREAM_SENDER[cougar] : STREAM cougar [send to rolls-royce:19999]: waiting response from remote netdata...
2019-02-11 19:11:34: netdata INFO : STREAM_SENDER[cougar] : STREAM cougar [send to rolls-royce:19999]: established communication - ready to send metrics...
2019-02-11 19:11:34: netdata INFO : PLUGIN[cgroups] : STREAM cougar [send]: sending metrics...
My rolls-royce error.log
also seems normal:
2019-02-11 19:04:59: netdata INFO : WEB_SERVER[static5] : clients wants to STREAM metrics.
2019-02-11 19:04:59: netdata INFO : STREAM_RECEIVER[cougar,[localhost]:37282] : thread created with task id 3797
2019-02-11 19:04:59: netdata INFO : STREAM_RECEIVER[cougar,[localhost]:37282] : STREAM cougar [localhost]:37282: receive thread created (task id 3797)
2019-02-11 19:04:59: netdata INFO : STREAM_RECEIVER[cougar,[localhost]:37282] : Host 'cougar' (at registry as 'cougar') with guid '8365354a-2e10-11e9-ab26-000c29120d5e' initialized, os 'linux', timezone 'GMT', tags '', program_name 'netdata', program_version 'v1.12.0-17-g30f7324', update every 1, memory mode save, history entries 3996, streaming disabled (to '' with api key ''), health enabled, cache_dir '/opt/netdata/var/cache/netdata/8365354a-2e10-11e9-ab26-000c29120d5e', varlib_dir '/opt/netdata/var/lib/netdata/8365354a-2e10-11e9-ab26-000c29120d5e', health_log '/opt/netdata/var/lib/netdata/8365354a-2e10-11e9-ab26-000c29120d5e/health/health-log.db', alarms default handler '/opt/netdata/usr/libexec/netdata/plugins.d/alarm-notify.sh', alarms default recipient 'root'
2019-02-11 19:04:59: netdata INFO : STREAM_RECEIVER[cougar,[localhost]:37282] : STREAM cougar [receive from [localhost]:37282]: initializing communication...
2019-02-11 19:04:59: netdata INFO : STREAM_RECEIVER[cougar,[localhost]:37282] : Postponing health checks for 60 seconds, on host 'cougar', because it was just connected.
2019-02-11 19:04:59: netdata INFO : STREAM_RECEIVER[cougar,[localhost]:37282] : STREAM cougar [receive from [localhost]:37282]: receiving metrics...
The master's [web]
section is entirely commented-out defaults, such as:
[web]
# default port = 19999
# bind to = *
# allow connections from = localhost *
# allow streaming from = *
This might mean something. I just discovered that I should be able to view http://rolls-royce:19999/host/cougar
When I go to that page, I see unformatted text data -- as if the JS and/or CSS isn't running correctly. I don't know if that's related or not. I can't seem to post a screen-shot ("Something went really wrong, and we can't process that file") but I'll keep trying.
--EbH
On the right hand side that shows all the chart categories, do you see a link for 'cougar'? See the following screenshot:
The menu shows my localhost.localdomain
under 'databases streamed to this host'. This is how the VM qemu fedora29
you see on the right side is accessible from my master. I replicated your configuration precisely, down to the use of the local registry and the headless part. So you shouldn't have to enter the URL manually at all. Can you do a Ctrl-Shift-R
and show me a similar screenshot from rolls-royce? In the meantime, I'll look into how that link is added.
http://rolls-royce:19999/api/v1/info should show cougar under mirrored_hosts
. Can you paste that section here?
I do not have a link on the right side for "cougar" that matches your "qemu fedora29".
I also do not have a "databases streamed to this host" section under my menu, just the "My nodes" section.
[Bad screenshot removed]
http://rolls-royce:19999/api/v1/info tells me this:
{
"version": "v1.12.0-17-g30f7324",
"uid": "c974dfc0-2e08-11e9-b6c1-001018afde44",
"mirrored_hosts": [
"rolls-royce",
"cougar"
],
"alarms": {
"normal": 180,
"warning": 0,
"critical": 0
}
}
It didn't occur to me to ask this, but can I safely assume that all data flow is over TCP port 19999 from cougar to rolls-royce? My environment is pretty heavily loaded with firewalls and access restrictions, so if there is behind-the-scenes UDP activity or something else I don't expect, that could be causing my problems.
Ok, the issue has nothing to do with streaming. There's a problem with the UI here. I don't see any charts for rolls-royce, your page should be full of them. Do you have any strange security settings in your browser for javascript? Try using the Chrome console to see what's wrong.
That was my fault. I took a bad screenshot. here is a new one. The graphs are fine.
I detected a bug that affects what you see in the menu. It happened to me too. It does NOT affect the right hand side though, I still don't get why cougar doesn't appear there. I'll do some more digging.
Going a bit blindly here, based on past issues. Can you do a cat /var/lib/netdata/registry/netdata.public.unique.id
on both machines to ensure that you have different machine guids? Will look for more ideas.
Also, can you do this on the master so we can ensure that the metrics are being collected?
ls -l /var/cache/netdata/ | grep cougar
cougar /var/lib/netdata/registry/netdata.public.unique.id
8365354a-2e10-11e9-ab26-000c29120d5e
rolls-royce /var/lib/netdata/registry/netdata.public.unique.id
c974dfc0-2e08-11e9-b6c1-001018afde44
/var/cache/netdata
Nothing with the name "cougar". However, I do have cougar's uuid
/var/cache/netdata/8365354a-2e10-11e9-ab26-000c29120d5e
which is filled with subdirectories such as
cpu.cpu0
disk.dm_0
ipv4.packets
netdata.requests
system.cpu
(By the way, thanks again for all of your help. I appreciate it.)
Ok, so I went through the code and my setup is not identical, since my slave is on the same host (just a VM).
Please do on rolls-royce the change seen in https://github.com/netdata/netdata/pull/5371/files, you will probably have main.js under /usr/share/netdata/web/
. And do a hard refresh on the rolls-royce UI. This should show cougar on your menu, under 'Databases streamed to this agent'. After you see it and click it, let's see if you still have an issue.
That solved it. Now I have the "Databases streamed to this agent" section on the menu, with both nodes listed. Thank you very much.
I had this similar issue, even with the patch mentioned in pull-request 5371. I solved it by enabling registry, even though the documentation on https://docs.netdata.cloud/streaming/ never mention this. Is this necessary, or should the hosts appear automatically?
Please open a new issue, because it obviously can't be the same root cause. Provide in that issue your master and slave configuration just as in the OP here, and ensure that you have connectivity and are not receiving errors in the logs. Using the global registry is not required, but you do need to have a registry, even if its the master that serves it.
Question summary
My headless slave node is not appearing on my master netdata dashboard menu.
OS / Environment
RHEL
I've just installed netdata for testing as a replacement (or augment) for nagios. I have it installed on one machine and it's great.
However, I'm trying to install netdata on a second machine ("cougar"), with the intent of using the first machine ("rolls-royce") as my sole dashboard/viewing host.
I believe I have followed the directions correctly from https://docs.netdata.cloud/streaming/ for setting up a "headless collector", where "cougar" is my "slave" instance and "rolls-royce" is my "master" instance.
I also figured out that I need to have my own "registry" because there is no direct Internet access from either host.
cougar netdata.conf
cougar stream.conf
rolls-royce netdata.conf
rolls-royce stream.conf
And I think I see data being collected in the logs, and cache files being created.
However, I cannot figure out how to view my "cougar" data from my "rolls-royce" dashboard.
The documentation refers to a "my-netdata" menu. I don't have a "my-netdata" menu. I have a menu entitled "rolls-royce", with only a single entry for "rolls-royce http://rolls-royce:19999/" but no entry for "cougar".
Can anybody help me figure out what I am missing?