Closed cakrit closed 2 years ago
cc @Ferroin
I had the issue that I only saw cryptic container IDs and no names. Forum helped to add docker.sock
. I would just add the line to the CLI template pictured above (and to compose) so others don't need to start the same troubleshooting:
-v /var/run/docker.sock:/var/run/docker.sock:ro
Some people consider exposing docker.sock as security risk and rather have a proxy in between. You could just add a sentence below, like "If you consider binding docker.sock to a container a security risk, you can remove binding docker.sock. Instead of container names you will see container IDs."
Would be nice to mention Docker Swarm, not sure if it needs an extra section. For Docker Swarm the deploy
section is missing in the compose file. While at it, you could add resource limits to the file in case netdata goes crazy and doesn't overwhelm the node (not sure what reasonable values are). Docker Swarm should be able to deal with the volumes that are included in the file.
version: '3.8'
services:
netdata:
...
hostname: '{{.Node.Hostname}}'
...
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
...
deploy:
mode: global # single instance on every node
resources:
limits:
cpus: '0.5'
memory: 512M
reservations:
cpus: '0.2'
memory: 256M
PS: Little trick to set the container hostname automatically to the host's hostname. We use bare metal servers in a countable number, so our servers have usable hostnames (proxy-1, app-2, db-3), so this makes it easier for us.
hostname: '{{.Node.Hostname}}'
Prometheus node-exporter does it differently, it simply binds the hostname file to the container
volumes:
- /etc/hostname:/etc/nodename
@Ferroin can you check if you agree with all of these suggestions so we can move ahead to make the changes?
For the Docker socket, I would argue it’s better to clearly mention the Docker socket requirement on the page itself, instead of explicitly including a line for it in the sample command. Anything that has full access to the socket can pretty easily get access to the host system as whatever user the Docker daemon is running as unless steps are explicitly taken to limit access. We can’t take those steps for the user, and we can’t know what steps they may have taken, so it’s risky to just blindly expose users like that, even if we lose some functionality by not doing so.
For the Docker Swarm thing, I generally agree, though it’s not really possible for us to provide sane defaults for resource limits as our resource usage is too dependent on the exact configuration of the system being monitored.
Entirely agreed on the hostname thing, though I would list it as a suggestion and not just put it directly in the example, as it kind of breaks if a user runs more than one container on the same host. The Prometheus Node Exporter approach is a bit more resilient, though it should be a read-only mount.
@hugovalente-pm , @Ferroin will describe exactly what will need to change. Once he does, plz plan the change to the text as a low hanging fruit, so we can close it.
@hugovalente-pm
Taken together, my suggestion would be to:
deploy
section, and that we recommend using the global
deployment mode to have an instance running on each Docker Swarm node.We might also consider adding a link on that particular panel to our top-level documentation about running Netdata in Docker (https://learn.netdata.cloud/docs/agent/packaging/docker) directing users there if they run into any issues.
thanks @Ferroin , to check I'll ask the help from someone on the FE team to tackle 1st and 3rd bullet and you will take on this one?
Add a sentence on the Docker claiming page in the Cloud along the lines of: ‘Note that Netdata will use the hostname from the container in which it is run instead of that of the host system.’, and include a link directly to https://learn.netdata.cloud/docs/agent/packaging/docker#change-the-default-hostname. I’ll also add some more info in that particular documentation about how to pull the hostname directly from the node the container is running on based on the suggestions above by @bluepuma77.
@AlexNti as we discussed let's do:
Add a sentence on the Docker claiming page in the Cloud along the lines of: ‘Note that to correctly display container names when running Netdata as a Docker container, you must expose the Docker socket from the host inside the Netdata container.’, and include a link directly to https://learn.netdata.cloud/docs/agent/packaging/docker#docker-container-names-resolution.
add a note after the "Install..."
Add a section after the Docker Compose section explaining that the Docker Compose example can be used with Docker Swarm by adding an appropriate deploy section, and that we recommend using the global deployment mode to have an instance running on each Docker Swarm node.
the difference is on services: netdata:
area where we add at the bottom of it
...
deploy:
mode: global
...
let's put this when user clicks on a checkbox (or something) mentioning he is using Docker Swarm
version: '3'
services:
netdata:
image: netdata/netdata
container_name: netdata
hostname: example.com # set to fqdn of host
ports:
- 19999:19999
restart: unless-stopped
cap_add:
- SYS_PTRACE
security_opt:
- apparmor:unconfined
volumes:
- netdataconfig:/etc/netdata
- netdatalib:/var/lib/netdata
- netdatacache:/var/cache/netdata
- /etc/passwd:/host/etc/passwd:ro
- /etc/group:/host/etc/group:ro
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /etc/os-release:/host/etc/os-release:ro
environment:
- NETDATA_CLAIM_TOKEN=wzxd7LeMHHiAhjipg3f5ZAT8fbOwN3mfCa4IKhqofehHK0zFKSJrwQjYbM_ZQekWhlarXpQSlPo9pDfbsY9sJka5ColpVVW-UJ6f8Br7m4zeSzLm65-0T7T2lOU3sVFrt4wHJ4M
- NETDATA_CLAIM_URL=https://testing.netdata.cloud
- NETDATA_CLAIM_ROOMS=
deploy:
mode: global
volumes:
netdataconfig:
netdatalib:
netdatacache:
Also as we spoke let's:
Back
button/link on the Add node to War Room modal (to be in line to the Add node to Space modal)Checkout documentation
link on the sticky section of the modal, so that even when there are big pieces of code it is always visible@hugovalente-pm The second bullet point needs changes both on the FE side and in the agent docs. I can cover the agent documentation, but someone on the FE team will need to add the info for the Cloud.
I would argue it’s better to clearly mention the Docker socket requirement on the page itself, instead of explicitly including a line for it in the sample command.
IMO should be completely opposite. It just doesn't work, otherwise (prooved by time). The default should include everything we need (e.g. docker container name resolution (mounting socket), docker container network iface monitoring (host pid mode, SYS_ADMIN)) and add a note (make it warning if you want) - X needed for Y, remove if you have security concerns.
Providing default with broken docker container monitoring (and who knows what else, I bet some external collector don't work because we don't add required capabilities) is super strange if you ask me.
I think it has been discussed, but not sure we agreed on that.
@ilyam8 I contend that most of the reason users keep running into this problem is mostly that we do not make it clear that there is an issue at all.
I generally agree that it’s a bit strange that we don’t provide full functionality with the default suggestion, but I would argue it’s even stranger (and far more concerning) for us to provide a default that completely circumvents any protections provided by Docker (we’ve already stripped most security out as is, but directly exposing the Docker API socket is just about the most dangerous thing you can do).
@AlexNti after the chat with @Ferroin let's review the structure for Docker and Kubernetes I've done the suggested changes on here https://docs.google.com/spreadsheets/d/1qTZpItEVta3a0loqfiva0J4W4l2FOVrH53pHE9wRQgc/edit#gid=0
@Ferroin my point is what you are suggesting sounds correct in theory but completely fails in practice (assuming the goal is to have docker Netdata with ~ exp to Netdata installed on the host). I personally have a hard time adding everything I need to start Netdata docker container with full capabilities. I need to copy one thing, jump here and there and copy some parts, don't miss anything, and don't make an error. For me it is very annoying, I just want to be able copy and paste.
Also, I don't understand your concerns to be honest:
May I suggest an alternative? Provide two options, one with full access and one without, both with copy-able stuff. Easiest (but clunky) would be to show them one below the other, best would be to add another binary under the "Docker tab" that changes the commands based on a user option.
probably we should deal with that on another ticket? the scope was to add the Docker Swarm part and some notes.
what we are doing here doesn't contradict a future iteration, meaning that users that don't want the full pre-setup configuration still need those "Notes:" callout to know what do to if they need
idea is to not further disrupt the work already being done and already deliver some value to the users
Problem
Our instructions to install a containerized Netdata and connect it to the cloud are not quite there for Docker Swarm. Specifically:
I'm asking the user who encountered the issues to assist us with the exact text and template we should add to the existing instructions.
As a reminder, we're talking about the section here and the relevant installation documentation in learn.
Description
See above
Importance
really want
Value proposition
Easier installation for Docker Swarm use cases
Proposed implementation
Add a paragraph below the existing paragraphs. Will wait for specific user input.