netdata / msi-installer

Netdata installer for Windows using WSL2
GNU General Public License v3.0
21 stars 2 forks source link

Improve start/stop/restart #3

Closed cakrit closed 1 year ago

cakrit commented 1 year ago

The initial todo is to improve the README to explain how users should start/stop/restart it.

After a system restart Netdata did run, though I'm not sure how. Is it just because the image Netdata has it by default? It should be documented in the README

After the installation, I saw that directory C:\Program Files (x86)\Netdata includes some nice commands. I don't know if they're all correct, @Ferroin please check.

Also, I don't see the parent child relationship in the process tree between netdata and its children (spawn server and go.d.plugin): Why is that? I did see the process tree when I did the manual installation:

DESKTOP-C7OKV71:/usr/libexec/netdata/plugins.d# ps faux
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    7 root      0:00 /init
   10 netdata   0:00 netdata
   12 netdata   0:00 /usr/sbin/netdata --special-spawn-server
  251 netdata   0:00 /usr/libexec/netdata/plugins.d/go.d.plugin 1
  338 root      0:00 /init
  339 root      0:00 /init
  340 root      0:00 -ash
  379 root      0:00 ps faux
dfpr commented 1 year ago

The initial todo is to improve the README to explain how users should start/stop/restart it.

After a system restart Netdata did run, though I'm not sure how. Is it just because the image Netdata has it by default? It should be documented in the README

As the netdata.tar image that becomes the Netdata WSL distro is based on the docker image, the MSI adds to the Windows registry a startup entry for start-netdata.cmd that has the command wsl -d netdata netdata that starts the distro and runs the netdata binary found in the path.

After the installation, I saw that directory C:\Program Files (x86)\Netdata includes some nice commands. I don't know if they're all correct, @Ferroin please check.

Scripts are used during installation along other commands embedded in the MSI installer, they might be used by the end user but that wasn't intended. I'll check a better process to start/stop/restart than restarting Windows.

Also, I don't see the parent child relationship in the process tree between netdata and its children (spawn server and go.d.plugin): Why is that? I did see the process tree when I did the manual installation:

DESKTOP-C7OKV71:/usr/libexec/netdata/plugins.d# ps faux
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    7 root      0:00 /init
   10 netdata   0:00 netdata
   12 netdata   0:00 /usr/sbin/netdata --special-spawn-server
  251 netdata   0:00 /usr/libexec/netdata/plugins.d/go.d.plugin 1
  338 root      0:00 /init
  339 root      0:00 /init
  340 root      0:00 -ash
  379 root      0:00 ps faux

Certainly it doesn't have the parent/child here as well, I'm not sure why is that, maybe because the netdata wsl distro is generated from the docker image. As the start/stop/restart processes would be better handled as a whole for the distro this shouldn't be a problem.

I'll improve the README, specially for the stop/restart procedures.

Ferroin commented 1 year ago

The initial todo is to improve the README to explain how users should start/stop/restart it. After a system restart Netdata did run, though I'm not sure how. Is it just because the image Netdata has it by default? It should be documented in the README

As the netdata.tar image that becomes the Netdata WSL distro is based on the docker image, the MSI adds to the Windows registry a startup entry for start-netdata.cmd that has the command wsl -d netdata netdata that starts the distro and runs the netdata binary found in the path.

Is there any way we could integrate that as a Windows Service instead of a startup entry? Not sure how feasible that is, but if possible that would let users manage it the way they are probably already used to managing such things.

If not, then this is probably the cleanest option for auto-starting the agent.

After the installation, I saw that directory C:\Program Files (x86)\Netdata includes some nice commands. I don't know if they're all correct, @Ferroin please check.

Scripts are used during installation along other commands embedded in the MSI installer, they might be used by the end user but that wasn't intended. I'll check a better process to start/stop/restart than restarting Windows.

Unless we can get some way to integrate with Windows’ native service management as I suggested above, I would argue that just having scripts in here to handle starting/stopping/restarting the agent is fine, though we probably want to add a PATH entry for the directory if we’re doing that.

Also, I don't see the parent child relationship in the process tree between netdata and its children (spawn server and go.d.plugin): Why is that? I did see the process tree when I did the manual installation:

DESKTOP-C7OKV71:/usr/libexec/netdata/plugins.d# ps faux
PID   USER     TIME  COMMAND
    1 root      0:00 /init
    7 root      0:00 /init
   10 netdata   0:00 netdata
   12 netdata   0:00 /usr/sbin/netdata --special-spawn-server
  251 netdata   0:00 /usr/libexec/netdata/plugins.d/go.d.plugin 1
  338 root      0:00 /init
  339 root      0:00 /init
  340 root      0:00 -ash
  379 root      0:00 ps faux

Certainly it doesn't have the parent/child here as well, I'm not sure why is that, maybe because the netdata wsl distro is generated from the docker image.

Correct, it’s because it’s based on the Docker image. Busybox ps doesn’t display threads by default, and every child process of the main process other than the Go plugin should be a thread with the configuration we’re using here.

dfpr commented 1 year ago

README includes the restart command now, I'll see if agent can be started as a Windows service.

cakrit commented 1 year ago

This command causes Netdata to lose data, because it's not terminated properly. Upon receiving the kill signal, netdata stores the in-memory pages to the db and then exits. wsl -t prevents it from doing that.

We can do wsl -d netdata killall netdata and after that completes, wsl -t netdata & wsl -d netdata netdata I tried putting all three in the same line, but it doesn't work, it loses data again. You can see compare the behaviors when you have the UI open at localhost:19999.

What about the windows service? Is that possible? Would that help? @Ferroin I thought we were talking about some init.d commands, aren't those available?

Ferroin commented 1 year ago

This command causes Netdata to lose data, because it's not terminated properly. Upon receiving the kill signal, netdata stores the in-memory pages to the db and then exits. wsl -t prevents it from doing that.

We can do wsl -d netdata killall netdata and after that completes, wsl -t netdata & wsl -d netdata netdata I tried putting all three in the same line, but it doesn't work, it loses data again. You can see compare the behaviors when you have the UI open at localhost:19999.

This doesn’t work because killall just sends the signals. Nothing in that case is actually waiting for the agent to exit.\

What about the windows service? Is that possible? Would that help?

We would still need a command to cleanly shut down the agent and wait for it to shut down.

@Ferroin I thought we were talking about some init.d commands, aren't those available?

Those wouldn’t be available if we’re using the Docker images as a base, though they would in theory solve the issue.

We just need something to wait (up to some configurable timeout, probably with a 60 second default timeout) for the agent to exit after telling it to exit.

dfpr commented 1 year ago

I'll check how to restart without losing data, maybe mimicking the init.d daemon stop command that uses killproc.

For the Windows service, having WSL available before a user logins seems to be quite complicated as distros are installed per-user (see https://github.com/microsoft/WSL/issues/2979).

There are tools online that claim to aid in this (https://github.com/peppy0510/wsl-service) that I could look at.

Creating a service account, installing wsl there and having Task Scheduler run the start command for that user is something I'll try this weekend, so far it didn't work for the SYSTEM account.

cakrit commented 1 year ago

Ok, let's see where you'll get with that.

Ferroin commented 1 year ago

I'll check how to restart without losing data, maybe mimicking the init.d daemon stop command that uses killproc.

The key thing here is largely just ensuring that the main netdata process has exited before tearing down the WSL environment. If all else fails, repeatedly calling pgrep netdata in the WSL environment until that returns no output (or some timeout is reached) should work, though it’s not exactly the ideal solution here.

dfpr commented 1 year ago

README has been updated with improved start/stop/restart commands, please test @cakrit if data is lost when restarting.

cakrit commented 1 year ago

netdatacli doesn't work and spawns something that eats a lot of cpu. I just entered killall netdata in the README.

cakrit commented 1 year ago

FYI we have an issue with WSL1, see https://github.com/netdata/netdata/issues/13933 if you're interested.