liske / needrestart

Restart daemons after library updates.
GNU General Public License v2.0
424 stars 67 forks source link

Detect duplicity inside LXC as using obsolete script files #95

Open nirgal opened 6 years ago

nirgal commented 6 years ago

I'm using duply for backups, inside LXC containers. I run Debian stable.

When I run needrestart inside the LXC container, I get no false positive.

But when I run it from the hypervisor, it wants to restart the virtual machine complaining about files in something like /proc/25192/root//usr/lib/python2.7/.

Here are the logs: needrestart.hyper.log needrestart.lxc.log

liske commented 6 years ago

This might be triggered by the process start time calculation. Could you please provide the starttime (see also proc(5) ) of the process from inside the container and from the hypervisor:

$ awk '{ print $22; }' < /proc/$PID/stat

If they are different this might trigger the false positive (Proc::ProcessTable might be not namespace aware?).

nirgal commented 6 years ago

Both are the same value.

needrestart.hypervisor.log needrestart.vm.log procstat.hypervisor.txt procstat.vm.txt

liske commented 6 years ago

This is weirrd. On which filesystem did you put the LXC root (something special like btrfs or zfs)? Could you please provide the output of stat /proc/$PID/root/usr/lib/python2.7/fnmatch.py from inside and outside? Thanks!

nirgal commented 6 years ago

I'm using lvm. My container conf file contains:

lxc.rootfs = /dev/safe/mail
lxc.rootfs.backend = lvm

The file system itself is ext4.

The hypervisor says:

# stat /proc/18136/root/usr/lib/python2.7/fnmatch.py
  File: /proc/18136/root/usr/lib/python2.7/fnmatch.py
  Size: 3315        Blocks: 8          IO Block: 4096   regular file
Device: fe28h/65064d    Inode: 1841194     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-12-19 00:09:51.176757743 +0100
Modify: 2017-11-24 18:33:09.000000000 +0100
Change: 2017-12-13 18:46:34.825089385 +0100
 Birth: -

The VM says the same, only the pid changed:

# stat /proc/31992/root/usr/lib/python2.7/fnmatch.py
  File: /proc/31992/root/usr/lib/python2.7/fnmatch.py
  Size: 3315        Blocks: 8          IO Block: 4096   regular file
Device: fe28h/65064d    Inode: 1841194     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-12-19 00:09:51.176757743 +0100
Modify: 2017-11-24 18:33:09.000000000 +0100
Change: 2017-12-13 18:46:34.825089385 +0100
 Birth: -

needrestart seems to work ok for the other processes.

liske commented 6 years ago

It looks like there is a serious problem with the kernel's proc filesystem entries:

a) and b) do not correlate to each other when looking at them from inside of a container. This breaks the container detection stuff in needrestart since it gets the wrong starttime of the processes. It seems to be impossible to get the correct starttime of a process from within a container :-/

nirgal commented 6 years ago

Just to make sure we understand each other, I'm getting false positives when I'm outside the container. Inside the container, needrestart seems to run fine.


I tried to understand a bit more what is going on. Inside the container, I have these mounts:

proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
proc on /proc/sys/net type proc (rw,nosuid,nodev,noexec,relatime)
proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime)
lxcfs on /proc/cpuinfo type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/diskstats type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/meminfo type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/stat type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/swaps type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/uptime type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
(...)

These looks like they are created by the settings in /usr/share/lxc/config/common.conf included by default:

lxc.mount.auto = cgroup:mixed proc:mixed sys:mixed

which according to lxc.container.conf(5) "mount /proc as read-write, but remount /proc/sys and /proc/sysrq-trigger read-only for security / container isolation purposes."

Also, on the hypervisor, I have this mount:

lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)

I have no clue what this does, but /var/lib/lxcfs/proc contains files cpuinfo diskstats meminfo stat swaps and uptime.


Another side note, in the container, ps u reports the correct start date of the processes. So I suppose there is a way to get it.


Another another side note (^^), on the hypervisor side, ls -l /proc/24020/root wrongfully reports being a symlink to '/'. This confuses readlink.

# ls -l /proc/24020/root
lrwxrwxrwx 1 root root 0 Dec 21 12:13 /proc/24020/root -> /
# readlink -f /proc/24020/root/usr
/usr
# stat /proc/24020/root/usr
  File: /proc/24020/root/usr
  Size: 4096        Blocks: 8          IO Block: 4096   directory
Device: fe28h/65064d    Inode: 1835010     Links: 10
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-12-20 09:04:47.912467259 +0100
Modify: 2017-12-12 15:24:18.433319640 +0100
Change: 2017-12-12 15:25:45.193129885 +0100
 Birth: -
# stat /usr
  File: /usr
  Size: 4096        Blocks: 8          IO Block: 4096   directory
Device: fe01h/65025d    Inode: 130308      Links: 10
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2017-11-08 12:47:29.776893183 +0100
Modify: 2016-10-12 21:43:36.796532759 +0200
Change: 2016-10-12 21:43:36.796532759 +0200
 Birth: -

The device of stat /proc/24020/root/usr is ok. This points at the container root. The device of stat $(readllink /proc/24020/root/usr) is wrong. kinda. :/

liske commented 6 years ago

You are right - my previous argumentation would explain why it does not detect pending interpreter restarts inside of a container (due the calculated process starttime is later than it was in real, especially if the container was started longer time after the hypervisor).

In your stat call for one of the python files (fnmatch.py) the modification time was: 2017-11-24 18:33:09.000000000 +0100

But the process id 18136 was started later, wasn't it? Could you please verify using ps and the output of perl -MProc::ProcessTable -MData::Dumper -Mstrict -e 'my $pid = 18136; my $p = new Proc::ProcessTable(); print Dumper(grep { $_->{pid} == $pid} @{ $p->table });'?


Mountings: looks similar on my system including the lxcfs mounting on the hypervisor


Site note about ps u: are you sure?

root@outside ~ # ps u|grep tty4
[...]
root     18274  0.0  0.0  14716  1632 pts/3    Ss+  Dec20   0:00 /sbin/getty 38400 tty4 linux

root@outside ~ # cat /proc/uptime 
15978589.39 123393957.91
root@inside:~# ps u | grep tty4
root       989  0.0  0.0  14716  1632 pts/3    Ss+   2018   0:00 /sbin/getty 38400 tty4 linux

root@inside:~# cat /proc/uptime
74778.0 73795.0

Is it possible that your container has been started right after the hypervisor? If so the result of awk '{ print $22; }' < /proc/1/stat from inside the container should be a low integer value.

It sounds a little bit similar to Debian Bug#842879 regards VPS. I think its broken by design for namespaces (one is unable to calculate a process startup time due to the problem in my previous post).


The /proc/$PID/root link is something special. IMHO it allows to traverse between namespaces (like a wormhole ;-). A (root) process running in the outer mount namespace can access into mount namespaces by accessing /proc/$PID/root/path/to/inside/file. Although the magic symlink shows / as target it does not behave like that if a process traverse into it (this also works for chroots).

nirgal commented 6 years ago

On the hypervisor (false positive):

root@yod:~# needrestart -v
[main] eval /etc/needrestart/needrestart.conf
[main] needrestart v2.11
[main] running in root mode
[Core] Using UI 'NeedRestart::UI::stdio'...
[Core] #16188 is a NeedRestart::Interp::Perl
[Perl] #16188: uses no source file (-e), skipping
[Core] #16662 is a NeedRestart::Interp::Python
[Python] #16662: source=/usr/bin/duplicity
[Core] #16662 uses obsolete script file(s):
[Core] #16662  /proc/16662/root//usr/lib/python2.7/__future__.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/StringIO.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/contextlib.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/ctypes/wintypes.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/types.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/_weakrefset.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/re.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/posixpath.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/xml/parsers/expat.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/pdb.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/string.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/pprint.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/base64.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/calendar.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/bdb.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/encodings/aliases.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/UserDict.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/getpass.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/copy.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/socket.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/collections.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/functools.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/io.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/copy_reg.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/gzip.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/fnmatch.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/plistlib.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/keyword.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/tempfile.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/genericpath.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/doctest.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/_threading_local.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/random.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/subprocess.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/locale.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/repr.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/os2emxpath.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/cmd.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/abc.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/warnings.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/_abcoll.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/threading.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/weakref.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/traceback.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/sre_constants.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/platform.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/sre_parse.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/ssl.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/dummy_thread.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/hashlib.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/sre_compile.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/stat.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/os.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/shlex.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/gettext.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/struct.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/ntpath.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/optparse.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/linecache.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/pickle.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/heapq.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/textwrap.py
[Core] #16662  /proc/16662/root//usr/lib/python2.7/statvfs.py
[LXC] #16662 is part of LXC container 'mail' and should be restarted
# ps aufxwww | grep 16662
root     16662  2.1  0.5 251468 43132 ?        Sl   23:00   0:19      |   |           \_ /usr/bin/python /usr/bin/duplicity --name duply_mail --encrypt-key XXXXXXXXXXXXXXXX --sign-key XXXXXXXXXXXXXXXX --verbosity 4 --full-if-older-than 15D --ssh-options=-oIdentityFile=/root/.duply/mail/id_rsa --force --exclude-filelist /root/.duply/mail/exclude / sftp://mailbkp@xxxxxxxxxxxxx.xxx/duply
# perl -MProc::ProcessTable -MData::Dumper -Mstrict -e 'my $pid = 16662; my $p = new Proc::ProcessTable(); print Dumper(grep { $_->{pid} == $pid} @{ $p->table });'
$VAR1 = bless( {
                 'cstime' => '40000',
                 'stime' => '1750000',
                 'pctmem' => '0.54',
                 'gid' => 0,
                 'start' => '1514329211',
                 'cmajflt' => '28',
                 'cminflt' => '2376',
                 'pgrp' => 16181,
                 'majflt' => '27',
                 'wchan' => '0',
                 'pid' => 16662,
                 'ttynum' => 0,
                 'ttydev' => '',
                 'cutime' => '1000000',
                 'fname' => 'duplicity',
                 'exec' => '/usr/bin/python2.7',
                 'fgid' => 0,
                 'ctime' => '1040000',
                 'uid' => 0,
                 'euid' => 0,
                 'rss' => '44167168',
                 'time' => '20020000',
                 'sess' => 16181,
                 'flags' => '1077936384',
                 'minflt' => '53434',
                 'fuid' => 0,
                 'suid' => 0,
                 'sgid' => 0,
                 'cwd' => '/root',
                 'size' => '257503232',
                 'state' => 'sleep',
                 'ppid' => 16184,
                 'pctcpu' => '  1.93',
                 'priority' => 20,
                 'utime' => '18270000',
                 'cmndline' => '/usr/bin/python /usr/bin/duplicity --name duply_mail --encrypt-key XXXXXXXXXXXXXXXX --sign-key XXXXXXXXXXXXXXXX --verbosity 4 --full-if-older-than 15D --ssh-options=-oIdentityFile=/root/.duply/mail/id_rsa --force --exclude-filelist /root/.duply/mail/exclude / sftp://mailbkp@xxxxxxxxxxxxx.xxx/duply',
                 'egid' => 0
               }, 'Proc::ProcessTable::Process' );

Inside the lxc container (no false positive):

root@mail:~# ps aufxwww | grep duplicity
root      7716  2.0  0.5 251468 43132 ?        Sl   00:01   0:20  |           \_ /usr/bin/python /usr/bin/duplicity --name duply_mail --encrypt-key XXXXXXXXXXXXXXXX --sign-key XXXXXXXXXXXXXXXX --verbosity 4 --full-if-older-than 15D --ssh-options=-oIdentityFile=/root/.duply/mail/id_rsa --force --exclude-filelist /root/.duply/mail/exclude / sftp://mailbkp@xxxxxxxxxxxxx.xxx/duply
# perl -MProc::ProcessTable -MData::Dumper -Mstrict -e 'my $pid = 7716; my $p = new Proc::ProcessTable(); print Dumper(grep { $_->{pid} == $pid} @{ $p->table });'
$VAR1 = bless( {
                 'cmajflt' => '28',
                 'cmndline' => '/usr/bin/python /usr/bin/duplicity --name duply_mail --encrypt-key XXXXXXXXXXXXXXXX --sign-key XXXXXXXXXXXXXXXX --verbosity 4 --full-if-older-than 15D --ssh-options=-oIdentityFile=/root/.duply/mail/id_rsa --force --exclude-filelist /root/.duply/mail/exclude / sftp://mailbkp@xxxxxxxxxxxxx.xxx/duply',
                 'time' => '20070000',
                 'euid' => 0,
                 'fuid' => 0,
                 'state' => 'sleep',
                 'fname' => 'duplicity',
                 'ctime' => '1040000',
                 'rss' => '44040192',
                 'pid' => 7716,
                 'ppid' => 7372,
                 'stime' => '1760000',
                 'pctmem' => '0.53',
                 'egid' => 0,
                 'cstime' => '40000',
                 'sgid' => 0,
                 'pctcpu' => '  1.94',
                 'cutime' => '1000000',
                 'priority' => 20,
                 'exec' => '/usr/bin/python2.7',
                 'ttynum' => 0,
                 'suid' => 0,
                 'wchan' => '0',
                 'gid' => 0,
                 'start' => '1514329298',
                 'size' => '257503232',
                 'pgrp' => 7370,
                 'utime' => '18310000',
                 'uid' => 0,
                 'ttydev' => '',
                 'cwd' => '/root',
                 'minflt' => '53436',
                 'cminflt' => '2376',
                 'flags' => '1077936384',
                 'sess' => 7370,
                 'majflt' => '27',
                 'fgid' => 0
               }, 'Proc::ProcessTable::Process' );

You were correct, ps u yield incorrect futuristic start times when run inside the container. Sorry for the false information there.