netdata / netdata

Architected for speed. Automated for easy. Monitoring and troubleshooting, transformed!
https://www.netdata.cloud
GNU General Public License v3.0
72k stars 5.93k forks source link

pfSense 2.4.4-RELEASE-p3: /usr/local/etc/rc.d/netdata: WARNING: failed to start netdata #7473

Closed VivaldiKF closed 4 years ago

VivaldiKF commented 4 years ago
Bug report summary

Unable to install netdata version 1.19.0 on pfSense 2.4.4-RELEASE-p3.

I used the following commands, inspired by the recommended documentation:

pkg install pkgconf
pkg install bash
pkg install e2fsprogs-libuuid
pkg install libuv
pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/Judy-1.0.5_2.txz
pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/python36-3.6.9_1.txz
ln -s /usr/local/lib/libjson-c.so /usr/local/lib/libjson-c.so.4
pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/netdata-1.19.0.txz

I received the following missmatch warnings:

pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/Judy-1.0.5_2.txz
Fetching Judy-1.0.5_2.txz: 100%  217 KiB 222.2kB/s    00:01    
Installing Judy-1.0.5_2...
Newer FreeBSD version for package Judy:
To ignore this error set IGNORE_OSVERSION=yes
- package: 1103000
- running kernel: 1102000
Allow missmatch now?[Y/n]: y
Extracting Judy-1.0.5_2: 100%
pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/python36-3.6.9_1.txz
Fetching python36-3.6.9_1.txz: 100%   15 MiB   5.3MB/s    00:03    
Installing python36-3.6.9_1...
Newer FreeBSD version for package python36:
To ignore this error set IGNORE_OSVERSION=yes
- package: 1103000
- running kernel: 1102000
Allow missmatch now?[Y/n]: Y
Extracting python36-3.6.9_1: 100%
pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/netdata-1.19.0.txz
Fetching netdata-1.19.0.txz: 100%    2 MiB 584.5kB/s    00:03    
Installing netdata-1.19.0...
Newer FreeBSD version for package netdata:
To ignore this error set IGNORE_OSVERSION=yes
- package: 1103000
- running kernel: 1102000
Allow missmatch now?[Y/n]: Y
===> Creating groups.
Creating group 'netdata' with gid '302'.
===> Creating users
Creating user 'netdata' with uid '302'.
Extracting netdata-1.19.0: 100%

Because the documentation did not provide instructions on how to deal with the missmatch condition, I selected Y each time the warning appeared.

After updating the bind to in /usr/local/etc/netdata/netdata.conf, I tried to start the service:

service netdata onestart
Starting netdata.
Bad -c option
/usr/local/etc/rc.d/netdata: WARNING: failed to start netdata
OS / Environment

pfSense 2.4.4-RELEASE-p3

Expected behavior

The netdata service successfully starts, without warnings or errors, and listens on port 19999.

rdnsx commented 4 years ago

Hi,

i have the same problem here also.

pkg install pkgconf
pkg install bash
pkg install e2fsprogs-libuuid
pkg install libuv
pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/Judy-1.0.5_2.txz
pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/python36-3.6.9_3.txz
ln -s /usr/local/lib/libjson-c.so /usr/local/lib/libjson-c.so.4
pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/netdata-1.19.0.txz

i changed to bind to = 0.0.0.0 in the netdata.conf

service netdata onestart outputs:

Starting netdata.
Bad -c option
/usr/local/etc/rc.d/netdata: WARNING: failed to start netdata
prologic commented 4 years ago

I'll have a look into this...

prologic commented 4 years ago

Also discovered one of the instructions is either wrong or the upstream FreeBSD package has moved or changed:

[2.4.4-RELEASE][root@pfSense.localdomain]/root: pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/python36-3.6.9.txz
pkg: http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/python36-3.6.9.txz: Not Found
prologic commented 4 years ago

Hi @VivaldiKF

I am unable to repro your exact issue:

Bad -c option

See:

[2.4.4-RELEASE][root@pfSense.localdomain]/root: service netdata onestart
Starting netdata.
[2.4.4-RELEASE][root@pfSense.localdomain]/root: ps aux | grep netdata
netdata 97253  3.0  2.8 31540 28144  -  SN   03:52    0:00.59 /usr/local/bin/python3.6 /usr/local/libexec/netdata/plugins.d/python.d.plugin 1
netdata 95608  0.0  1.6 22372 15904  -  SN   03:52    0:00.09 /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid
netdata 97891  0.0  0.7 12728  7448  -  SN   03:52    0:00.00 /usr/local/libexec/netdata/plugins.d/apps.plugin 1
root    34574  0.0  0.0   408   324  0  R+   03:52    0:00.00 grep netdata

I followed the same steps as you. Are you able to give this another go and let me know if the problem still persists? (_its possible it was fixed and a hot fixed package was pushed the FreeBSD Package Repo

prologic commented 4 years ago

FWIW I have the same command_args that contains the -c option:

[2.4.4-RELEASE][root@pfSense.localdomain]/root: grep '\-c' < /usr/local/etc/rc.d/netdata
command_args="-c -f ${procname} -u ${netdata_user} -P ${netdata_pid} ${netdata_args}"
prologic commented 4 years ago

-c is suppose to be an option to the daemon tool:

 -c        Change the current working directory to the root ("/").

Its unclear to me why this isn't working on your pfSense environment for either of you @VivaldiKF @rdnsx -- Can either of you show me the output of daemon --help on your systems?

vlvkobal commented 4 years ago

FYI - a long discussion on the subject is in #3469.

rdnsx commented 4 years ago

Hi @prologic,

thank you for your effort, i really appreciate it. I´ll gave it another go, but have the same issue. My first attempt was 10 hours ago from now and the second attemp 1 hour ago from now. So i´m sure i was using the latest netdata package. (Last push was 2020-Jan-01 04:50)

Here is the output of daemon --help:

daemon --help
daemon: illegal option -- -
usage: daemon [-cfrS] [-p child_pidfile] [-P supervisor_pidfile]
              [-u user] [-o output_file] [-t title]
              [-l syslog_facility] [-s syslog_priority]
              [-T syslog_tag] [-m output_mask] [-R restart_delay_secs]
command arguments ...

This seems not to be helpfull ;)

prologic commented 4 years ago

Can either/both of you confirm the output of uname -a on your pfSense instances?

[2.4.4-RELEASE][root@pfSense.localdomain]/root: uname -a
FreeBSD pfSense.localdomain 11.2-RELEASE-p10 FreeBSD 11.2-RELEASE-p10 #9 4a2bfdce133(RELENG_2_4_4): Wed May 15 18:54:42 EDT 2019     root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-244/obj/amd64/ZfGpH5cd/build/ce-crossbuild-244/pfSense/tmp/FreeBSD-src/sys/pfSense  amd64

Unless I'm running something different to you I can't explain the Bad -c option and you also confirm you have the -c option for the daemon tool.

rdnsx commented 4 years ago

Here's my uname -a output:

[2.4.4-RELEASE][root@pfsense. localdomain]/root: uname -a
FreeBSD pfsense.localdomain 11.2-RELEASE-p10 FreeBSD 11.2-RELEASE-p10 #9 4a2bfdce133(RELENG_2_4_4): Wed May 15 18:54:42 EDT 2019     root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-244/obj/amd64/ZfGpH5cd/build/ce-crossbuild-244/pfSense/tmp/FreeBSD-src/sys/pfSense  amd64
prologic commented 4 years ago

Yeah so its pretty much the same version as what I have here. Do you mind doing a refresh install at all? (fresh re-install of pfSense that it)

rdnsx commented 4 years ago

Its running in a company environment without failover unit at the moment. When a failover unit is there, maybe.

But I can't reinstall pfsense, install netdata and than push the backup to the pfsense machine, just like this.

prologic commented 4 years ago

I'll have a think about this some more... Meanwhile can either of you try something for me? Please edit the startup config usr/local/etc/rc.d/netdata and remove the -c option from command_args. From my understanding of the FreeBSD daemon tool this option is just changing the CWD of the process to / which isn't super important (I don't think) for a correctly functioning NetData instance.

rdnsx commented 4 years ago

I removed the -c option from command_args= from the start config /usr/local/etc/rc.d/netdata

Now it looks like:

command_args="-f ${procname} -u ${netdata_user} -P ${netdata_pid} ${netdata_args}"

service netdata onestart
Starting netdata.
Bad -c option
/usr/local/etc/rc.d/netdata: WARNING: failed to start netdata

I double checked the -c option in the start config is gone

prologic commented 4 years ago

Oooh :) Wait just a minute... Can you paste me the contents of your /usr/local/etc/rc.d/netdata after undoing your change (_re-add the -c option back to command_args_)

rdnsx commented 4 years ago
#!/bin/sh

#
# $FreeBSD: head/net-mgmt/netdata/files/netdata.in 496470 2019-03-21 15:05:27Z mmokhi $
#

# PROVIDE: netdata
# REQUIRE: LOGIN                                                                                                                                             # KEYWORD: shutdown

#
# Add the following line to /etc/rc.conf to enable netdata:
# netdata_enable (bool):        Set to "NO" by default.
#                               Set it to "YES" to enable netdata.
# netdata_args (str):           Custom additional arguments to be passed
#                               to netdata (default empty).
#

. /etc/rc.subr                                                                                                                                               
name="netdata"
rcvar=netdata_enable

load_rc_config $name

: ${netdata_enable="NO"}                                                                                                                                     : ${netdata_user="netdata"}
: ${netdata_pid="/var/db/netdata/${name}.pid"}

procname="/usr/local/sbin/${name}"
command="/usr/sbin/daemon"
command_args="-c -f ${procname} -u ${netdata_user} -P ${netdata_pid} ${netdata_args}"

required_files="/usr/local/etc/netdata/${name}.conf"
prologic commented 4 years ago

Can you confirm that /etc/rc.conf.d is empty on your system? It is on mine too; so I'm a bit puzzled as to where/how netdata_args is coming from and what it is on your system vs. mine (where mine works okay)

prologic commented 4 years ago

Can you confirm you can start netdata manually with:

/usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid

And confirm with ps aux | grep netdata and try to hit the Web on port :19999 (the default)

rdnsx commented 4 years ago

/etc/rc.conf.d is empty.

/usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid
2020-01-04 01:05:27: netdata INFO  : MAIN : SIGNAL: Not enabling reaper
ps aux | grep netdata
netdata 86487   0.1  0.2  36836  17860  -  IN   01:05       0:01.46 /usr/local/sbi
netdata  1748   0.0  0.3  38108  25564  -  SN   01:05       0:01.23 /usr/local/bin
netdata 88025   0.0  0.1  12728   7260  -  SN   01:05       0:00.08 /usr/local/lib
root    16778   0.0  0.0   6564   2384  0  S+   01:07       0:00.00 grep netdata

Also available under :19999

prologic commented 4 years ago

Okay good! That works. Now I just have to continue root causing two problems (I found another):

rdnsx commented 4 years ago

Thanks so far sir. If i can help, just let me know.

RAMilewski commented 4 years ago

Okay good! That works. Now I just have to continue root causing two problems (I found another):

* [ ]  Figure out where/how the `Bad -c option` is coming from and fix it

* [ ]  Figure out how to auto-start netdata on boot `service netdata start` complains

prologic: On PFSense boxes it's 'service netdata onestart'

prologic commented 4 years ago

prologic: On PFSense boxes it's 'service netdata onestart'

Doesn't this only do a "one shot" run of the service? Wouldn't you want to automatically start on boot too?

mmangione commented 4 years ago

Having this problem as well on a fresh install of Pfsense 2.4.4-RELEASE-p3.

Can confirm that removing the -c from nano /usr/local/etc/rc.d/netdata does not resolve this issue. There is something wrong with how this command is interfacing with /etc/rc.subr. Not 100% certain with what's going wrong, but I think it has to do with the run_rc_command function at lines L1061 - 1069

if [ -n "$_user" ]; then
    _doit="su -m $_user -c 'sh -c \"$_doit\"'"
fi
if [ -n "$_nice" ]; then
    if [ -z "$_user" ]; then
        _doit="sh -c \"$_doit\""
    fi
    _doit="nice -n $_nice $_doit"
fi
mmangione commented 4 years ago

In fact, when I run

truss -dae service netdata onestart

The portion that fails reads:

0.072540115 eaccess("/usr/local/etc/netdata/netdata.conf",R_OK) = 0 (0x0) Starting netdata. 0.072696904 write(1,"Starting netdata.\n",18) = 18 (0x12) 0.072853834 stat("/sbin/limits",0x7fffffffdf08) ERR#2 'No such file or directory' 0.072933153 stat("/bin/limits",0x7fffffffdf08) ERR#2 'No such file or directory' 0.073008333 stat("/usr/sbin/limits",0x7fffffffdf08) ERR#2 'No such file or directory' 0.073083373 stat("/usr/bin/limits",{ mode=-r-xr-xr-x ,inode=12923332,size=20880,blksize=32768 }) = 0 (0x0) 0.073185053 vfork() = 38855 (0x97c7) Bad -c option 0.083937599 wait4(-1,{ EXITED,val=2 },0x0,0x0) = 38855 (0x97c7)

When I run:

truss -fdae service netdata onestart

I get:

24980: 0.473577151 execve("/bin/sh",[ "/bin/sh", "-c", "sh", "-c", ""/usr/sbin/daemon", "-c", "-f", "/usr/local/sbin/netdata", "-u", "netdata", "-P", "/var/db/netdata/netdata.pid", """ ],[ "PATH=/sbin:/bin:/usr/sbin:/usr/bin", "PWD=/", "HOME=/", "RC_PID=23287" ]) = 0 (0x0)

The command that is failing comes from a combination of /etc/rc.subr and/usr/local/etc/rc.d/netdata and is:

/bin/sh -c /usr/sbin/daemon -c -f /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid

But, if I run:

/usr/sbin/daemon -c -f /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid

The netdata process starts successfully.

It looks to me to be the segment:

/bin/sh -c

that is failing. The man page for this command reads that:

-c string Read commands from string.

Not sure how to fix it, but I think this is the problem.

mmangione commented 4 years ago

A temporary fix for this issue so that users of 2.4.4-RELEASE-p3 can use Netdata is to ignore the Shellcmd step from the instructions. Instead, paste the following command:

/usr/sbin/daemon -c -f /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid

into the Shellcmd service instead of:

service netdata onestart

You'll need to run that command as root from a shell. I logged in through SSH, but it will work just fine if you connect with a VGA cable and use a keyboard.

See the screenshot attached screenshot.

Screen Shot 2020-01-13 at 7 24 33 PM

prologic commented 4 years ago

The comment provided by @mmangione is the most insightful at this point; and I will look into this further tomorrow. The trouble I've had so far is my inability to reproduce this Bad -c option on any of my test pfSense virtual environments :/ I have to admit its got me a bit stumped; but will continue digging into this tomorrow.

cosmix commented 4 years ago

This seems to be related to #4265 and #3469.

This seems to be the answer:

https://github.com/netdata/netdata/issues/3469#issuecomment-457759807

prologic commented 4 years ago

Can confirm @mmangione 's findings:

[2.4.4-RELEASE][root@pfSense.localdomain]/root: /bin/sh -c /usr/sbin/daemon -c -f /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid
Bad -c option
[2.4.4-RELEASE][root@pfSense.localdomain]/root:
[2.4.4-RELEASE][root@pfSense.localdomain]/root: /usr/sbin/daemon -c -f /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid
[2.4.4-RELEASE][root@pfSense.localdomain]/root: ps aux | grep netdata
netdata 10705 11.9  2.7 31552 27148  -  SN   03:26   0:00.56 /usr/local/bin/python3.6 /usr/local/libexec/netdata/plugins.d/python.d.plugin 1
netdata  9356  0.0  1.7 24420 16664  -  SN   03:26   0:00.09 /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid
netdata 11222  0.0  0.7 12728  7328  -  SN   03:26   0:00.00 /usr/local/libexec/netdata/plugins.d/apps.plugin 1
netdata 60693  0.0  1.7 24932 17408  -  IN   03:19   0:01.28 /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid
netdata 62477  0.0  3.6 40416 35572  -  SN   03:19   0:01.47 /usr/local/bin/python3.6 /usr/local/libexec/netdata/plugins.d/python.d.plugin 1
netdata 62753  0.0  0.8 12728  7608  -  SN   03:19   0:00.20 /usr/local/libexec/netdata/plugins.d/apps.plugin 1
root    45005  0.0  0.0   408   324  0  R+   03:26   0:00.00 grep netdata
[2.4.4-RELEASE][root@pfSense.localdomain]/root: service netdata stop
Stopping netdata.
Waiting for PIDS: 9356 60693.
[2.4.4-RELEASE][root@pfSense.localdomain]/root: ps aux | grep netdata
root    50620   0.0  0.0   408   324  0  R+   03:26   0:00.00 grep netdata
[2.4.4-RELEASE][root@pfSense.localdomain]/root: /bin/sh -c "/usr/sbin/daemon -c -f /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid"
[2.4.4-RELEASE][root@pfSense.localdomain]/root: ps aux | grep netdata
netdata 71261 15.8  2.7 31552 27136  -  SN   03:26   0:00.56 /usr/local/bin/python3.6 /usr/local/libexec/netdata/plugins.d/python.d.plugin 1
netdata 68876  0.0  1.7 24420 16844  -  SN   03:26   0:00.10 /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid
netdata 71413  0.0  0.7 12728  7340  -  SN   03:26   0:00.00 /usr/local/libexec/netdata/plugins.d/apps.plugin 1
root     8425  0.0  0.0   408   324  0  R+   03:26   0:00.00 grep netdata
[2.4.4-RELEASE][root@pfSense.localdomain]/root:
prologic commented 4 years ago

@mmangione Would we be able to schedule some time with you over a call or Slack/IRC/Messenger/Signal/whatever to help fix/resolve this? I'm still having a hard time reproducing this on any pfSense environment I can produce; but I'm convinced there is some subtle differences.

mmangione commented 4 years ago

@prologic Yep. I can hop on any of the above. Whichever is most convenient. My evenings are usually pretty good. What does your schedule look like?

prologic commented 4 years ago

Pretty clear. I'll keep it open to tee up with you. I want to get this resolved! I'm @prologic on FreeNode and I'm hanging around on the #freebsd channel. Feel free to PRIVMSG me. Just give me a time window :)

cosmix commented 4 years ago

@prologic does this mean that we can remove the 'Cannot reproduce' label? Please keep this ticket updated if there's anything new.

prologic commented 4 years ago

@mmangione Sorry it looks like we missed each other; I'm always around on FreeNode prologic and as I still cannot repro this issue exactly on any BSD/pfSense environment I can create here I'm going to close this as "cannot reproduce". If this is still an issue for anyone please either re-open or file a new issue. Please feel free to reach out to me on FreeNode any time or email me at james at netdata dot cloud and we can get to this bottom of this.

eliezerlp commented 4 years ago

Funny enough I landed on this issue report before I experienced the issue. Now I am back to report it has recreated for me. I would be happy to help try to get to the bottom of this.

[2.4.4-RELEASE][admin@pfSense.***]/root: uname -a
FreeBSD pfSense.*** 11.2-RELEASE-p10 FreeBSD 11.2-RELEASE-p10 #9 4a2bfdce133(RELENG_2_4_4): Wed May 15 18:54:42 EDT 2019     root@buildbot1-nyi.netgate.com:/build/ce-crossbuild-244/obj/amd64/ZfGpH5cd/build/ce-crossbuild-244/pfSense/tmp/FreeBSD-src/sys/pfSense  amd64

[2.4.4-RELEASE][admin@pfSense.***]/root: pkg info | egrep 'pkgconf|bash|e2fsprogs-libuuid|libuv|Judy|python3|netdata'
Judy-1.0.5_2                   General purpose dynamic array
bash-4.4.23                    GNU Project's Bourne Again SHell
e2fsprogs-libuuid-1.44.4       UUID library from e2fsprogs package
libuv-1.21.0                   Multi-platform support library with a focus on asynchronous I/O
netdata-1.13.0                 Scalable distributed realtime performance and health monitoring
pkgconf-1.4.2,1                Utility to help to configure compiler and linker flags
python36-3.6.8_1               Interpreted object-oriented programming language
[2.4.4-RELEASE][admin@pfSense.***]/root: ls /etc/rc.conf.d/
[2.4.4-RELEASE][admin@pfSense.***]/root:
[2.4.4-RELEASE][admin@pfSense.***]/root: cat /usr/local/etc/rc.d/netdata
#!/bin/sh
#
# $FreeBSD: branches/2019Q2/net-mgmt/netdata/files/netdata.in 496470 2019-03-21 15:05:27Z mmokhi $
#

# PROVIDE: netdata
# REQUIRE: LOGIN
# KEYWORD: shutdown

#
# Add the following line to /etc/rc.conf to enable netdata:
# netdata_enable (bool):    Set to "NO" by default.
#               Set it to "YES" to enable netdata.
# netdata_args (str):       Custom additional arguments to be passed
#               to netdata (default empty).
#

. /etc/rc.subr

name="netdata"
rcvar=netdata_enable

load_rc_config $name

: ${netdata_enable="NO"}
: ${netdata_user="netdata"}
: ${netdata_pid="/var/db/netdata/${name}.pid"}

procname="/usr/local/sbin/${name}"
command="/usr/sbin/daemon"
command_args="-c -f ${procname} -u ${netdata_user} -P ${netdata_pid} ${netdata_args}"

required_files="/usr/local/etc/netdata/${name}.conf"

run_rc_command "$1"
prologic commented 4 years ago

Re-opening ! Thanks! Let's work to get to the bottom of this!

prologic commented 4 years ago

As I still cannot reproduce and haven't heard from @eliezerlp in some days I'm re-closing this. There is on-going work to properly support FreeBSD in #8304 (and other issues) so we'll hopefully have better support for FreeBSD there and eventually pfSense (I hope); but I cannot repro this issue with any pfSense I can manage to install.

sniffski commented 3 years ago

I'll just put this here... hope it helps further resolving as I'm also experiencing the same issue: root: /usr/local/sbin/netdata /usr/local/lib/libjson-c.so.5: version JSONC_0.14 required by /usr/local/sbin/netdata not defined

sniffski commented 3 years ago

and temp solution: pkg delete -f json-c-0.14 pkg add http://pkg.freebsd.org/FreeBSD:11:amd64/latest/All/json-c-0.15_1.txz then shellcmd: /usr/sbin/daemon -c -f /usr/local/sbin/netdata -u netdata -P /var/db/netdata/netdata.pid

odyslam commented 3 years ago

Hey @sniffski,

Thanks for the update and welcome to our community! I think Netdata should work with pfsense, pinging @Ferroin in case we need to add this solution to our docs.

Cheers