Open nirgal opened 6 years ago
This might be triggered by the process start time calculation. Could you please provide the starttime (see also proc(5) ) of the process from inside the container and from the hypervisor:
$ awk '{ print $22; }' < /proc/$PID/stat
If they are different this might trigger the false positive (Proc::ProcessTable
might be not namespace aware?).
Both are the same value.
needrestart.hypervisor.log needrestart.vm.log procstat.hypervisor.txt procstat.vm.txt
This is weirrd. On which filesystem did you put the LXC root (something special like btrfs or zfs)? Could you please provide the output of stat /proc/$PID/root/usr/lib/python2.7/fnmatch.py
from inside and outside? Thanks!
I'm using lvm. My container conf file contains:
lxc.rootfs = /dev/safe/mail
lxc.rootfs.backend = lvm
The file system itself is ext4.
The hypervisor says:
# stat /proc/18136/root/usr/lib/python2.7/fnmatch.py
File: /proc/18136/root/usr/lib/python2.7/fnmatch.py
Size: 3315 Blocks: 8 IO Block: 4096 regular file
Device: fe28h/65064d Inode: 1841194 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2017-12-19 00:09:51.176757743 +0100
Modify: 2017-11-24 18:33:09.000000000 +0100
Change: 2017-12-13 18:46:34.825089385 +0100
Birth: -
The VM says the same, only the pid changed:
# stat /proc/31992/root/usr/lib/python2.7/fnmatch.py
File: /proc/31992/root/usr/lib/python2.7/fnmatch.py
Size: 3315 Blocks: 8 IO Block: 4096 regular file
Device: fe28h/65064d Inode: 1841194 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2017-12-19 00:09:51.176757743 +0100
Modify: 2017-11-24 18:33:09.000000000 +0100
Change: 2017-12-13 18:46:34.825089385 +0100
Birth: -
needrestart seems to work ok for the other processes.
It looks like there is a serious problem with the kernel's proc filesystem entries:
/proc/$PID/stat
- the starttime (22) is the process start time expressed in seconds after system boot (read: real kernel startup)/proc/stat
- the btime is the absolute boot time which Proc::ProcessTable
uses to calculate the process start time of the current namespace/proc/uptime
- the uptime of the current namespacea) and b) do not correlate to each other when looking at them from inside of a container. This breaks the container detection stuff in needrestart since it gets the wrong starttime of the processes. It seems to be impossible to get the correct starttime of a process from within a container :-/
Just to make sure we understand each other, I'm getting false positives when I'm outside the container. Inside the container, needrestart seems to run fine.
I tried to understand a bit more what is going on. Inside the container, I have these mounts:
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
proc on /proc/sys/net type proc (rw,nosuid,nodev,noexec,relatime)
proc on /proc/sys type proc (ro,nosuid,nodev,noexec,relatime)
proc on /proc/sysrq-trigger type proc (ro,nosuid,nodev,noexec,relatime)
lxcfs on /proc/cpuinfo type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/diskstats type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/meminfo type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/stat type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/swaps type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
lxcfs on /proc/uptime type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
(...)
These looks like they are created by the settings in /usr/share/lxc/config/common.conf
included by default:
lxc.mount.auto = cgroup:mixed proc:mixed sys:mixed
which according to lxc.container.conf(5) "mount /proc as read-write, but remount /proc/sys and /proc/sysrq-trigger read-only for security / container isolation purposes."
Also, on the hypervisor, I have this mount:
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
I have no clue what this does, but /var/lib/lxcfs/proc contains files cpuinfo
diskstats
meminfo
stat
swaps
and uptime
.
Another side note, in the container, ps u
reports the correct start date of the processes. So I suppose there is a way to get it.
Another another side note (^^), on the hypervisor side, ls -l /proc/24020/root
wrongfully reports being a symlink to '/'. This confuses readlink.
# ls -l /proc/24020/root
lrwxrwxrwx 1 root root 0 Dec 21 12:13 /proc/24020/root -> /
# readlink -f /proc/24020/root/usr
/usr
# stat /proc/24020/root/usr
File: /proc/24020/root/usr
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: fe28h/65064d Inode: 1835010 Links: 10
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2017-12-20 09:04:47.912467259 +0100
Modify: 2017-12-12 15:24:18.433319640 +0100
Change: 2017-12-12 15:25:45.193129885 +0100
Birth: -
# stat /usr
File: /usr
Size: 4096 Blocks: 8 IO Block: 4096 directory
Device: fe01h/65025d Inode: 130308 Links: 10
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2017-11-08 12:47:29.776893183 +0100
Modify: 2016-10-12 21:43:36.796532759 +0200
Change: 2016-10-12 21:43:36.796532759 +0200
Birth: -
The device of stat /proc/24020/root/usr
is ok. This points at the container root.
The device of stat $(readllink /proc/24020/root/usr)
is wrong. kinda. :/
You are right - my previous argumentation would explain why it does not detect pending interpreter restarts inside of a container (due the calculated process starttime is later than it was in real, especially if the container was started longer time after the hypervisor).
In your stat call for one of the python files (fnmatch.py
) the modification time was:
2017-11-24 18:33:09.000000000 +0100
But the process id 18136 was started later, wasn't it? Could you please verify using ps
and the output of perl -MProc::ProcessTable -MData::Dumper -Mstrict -e 'my $pid = 18136; my $p = new Proc::ProcessTable(); print Dumper(grep { $_->{pid} == $pid} @{ $p->table });'
?
Mountings: looks similar on my system including the lxcfs mounting on the hypervisor
Site note about ps u
: are you sure?
root@outside ~ # ps u|grep tty4
[...]
root 18274 0.0 0.0 14716 1632 pts/3 Ss+ Dec20 0:00 /sbin/getty 38400 tty4 linux
root@outside ~ # cat /proc/uptime
15978589.39 123393957.91
root@inside:~# ps u | grep tty4
root 989 0.0 0.0 14716 1632 pts/3 Ss+ 2018 0:00 /sbin/getty 38400 tty4 linux
root@inside:~# cat /proc/uptime
74778.0 73795.0
Is it possible that your container has been started right after the hypervisor? If so the result of awk '{ print $22; }' < /proc/1/stat
from inside the container should be a low integer value.
It sounds a little bit similar to Debian Bug#842879 regards VPS. I think its broken by design for namespaces (one is unable to calculate a process startup time due to the problem in my previous post).
The /proc/$PID/root
link is something special. IMHO it allows to traverse between namespaces (like a wormhole ;-). A (root) process running in the outer mount namespace can access into mount namespaces by accessing /proc/$PID/root/path/to/inside/file
. Although the magic symlink shows /
as target it does not behave like that if a process traverse into it (this also works for chroots).
On the hypervisor (false positive):
root@yod:~# needrestart -v
[main] eval /etc/needrestart/needrestart.conf
[main] needrestart v2.11
[main] running in root mode
[Core] Using UI 'NeedRestart::UI::stdio'...
[Core] #16188 is a NeedRestart::Interp::Perl
[Perl] #16188: uses no source file (-e), skipping
[Core] #16662 is a NeedRestart::Interp::Python
[Python] #16662: source=/usr/bin/duplicity
[Core] #16662 uses obsolete script file(s):
[Core] #16662 /proc/16662/root//usr/lib/python2.7/__future__.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/StringIO.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/contextlib.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/ctypes/wintypes.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/types.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/_weakrefset.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/re.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/posixpath.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/xml/parsers/expat.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/pdb.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/string.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/pprint.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/base64.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/calendar.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/bdb.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/encodings/aliases.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/UserDict.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/getpass.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/copy.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/socket.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/collections.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/functools.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/io.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/copy_reg.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/gzip.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/fnmatch.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/plistlib.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/keyword.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/tempfile.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/genericpath.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/doctest.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/_threading_local.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/random.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/subprocess.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/locale.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/repr.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/os2emxpath.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/cmd.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/abc.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/warnings.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/_abcoll.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/threading.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/weakref.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/traceback.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/sre_constants.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/platform.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/sre_parse.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/ssl.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/dummy_thread.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/hashlib.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/sre_compile.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/stat.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/os.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/shlex.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/gettext.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/struct.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/ntpath.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/optparse.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/linecache.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/pickle.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/heapq.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/textwrap.py
[Core] #16662 /proc/16662/root//usr/lib/python2.7/statvfs.py
[LXC] #16662 is part of LXC container 'mail' and should be restarted
# ps aufxwww | grep 16662
root 16662 2.1 0.5 251468 43132 ? Sl 23:00 0:19 | | \_ /usr/bin/python /usr/bin/duplicity --name duply_mail --encrypt-key XXXXXXXXXXXXXXXX --sign-key XXXXXXXXXXXXXXXX --verbosity 4 --full-if-older-than 15D --ssh-options=-oIdentityFile=/root/.duply/mail/id_rsa --force --exclude-filelist /root/.duply/mail/exclude / sftp://mailbkp@xxxxxxxxxxxxx.xxx/duply
# perl -MProc::ProcessTable -MData::Dumper -Mstrict -e 'my $pid = 16662; my $p = new Proc::ProcessTable(); print Dumper(grep { $_->{pid} == $pid} @{ $p->table });'
$VAR1 = bless( {
'cstime' => '40000',
'stime' => '1750000',
'pctmem' => '0.54',
'gid' => 0,
'start' => '1514329211',
'cmajflt' => '28',
'cminflt' => '2376',
'pgrp' => 16181,
'majflt' => '27',
'wchan' => '0',
'pid' => 16662,
'ttynum' => 0,
'ttydev' => '',
'cutime' => '1000000',
'fname' => 'duplicity',
'exec' => '/usr/bin/python2.7',
'fgid' => 0,
'ctime' => '1040000',
'uid' => 0,
'euid' => 0,
'rss' => '44167168',
'time' => '20020000',
'sess' => 16181,
'flags' => '1077936384',
'minflt' => '53434',
'fuid' => 0,
'suid' => 0,
'sgid' => 0,
'cwd' => '/root',
'size' => '257503232',
'state' => 'sleep',
'ppid' => 16184,
'pctcpu' => ' 1.93',
'priority' => 20,
'utime' => '18270000',
'cmndline' => '/usr/bin/python /usr/bin/duplicity --name duply_mail --encrypt-key XXXXXXXXXXXXXXXX --sign-key XXXXXXXXXXXXXXXX --verbosity 4 --full-if-older-than 15D --ssh-options=-oIdentityFile=/root/.duply/mail/id_rsa --force --exclude-filelist /root/.duply/mail/exclude / sftp://mailbkp@xxxxxxxxxxxxx.xxx/duply',
'egid' => 0
}, 'Proc::ProcessTable::Process' );
Inside the lxc container (no false positive):
root@mail:~# ps aufxwww | grep duplicity
root 7716 2.0 0.5 251468 43132 ? Sl 00:01 0:20 | \_ /usr/bin/python /usr/bin/duplicity --name duply_mail --encrypt-key XXXXXXXXXXXXXXXX --sign-key XXXXXXXXXXXXXXXX --verbosity 4 --full-if-older-than 15D --ssh-options=-oIdentityFile=/root/.duply/mail/id_rsa --force --exclude-filelist /root/.duply/mail/exclude / sftp://mailbkp@xxxxxxxxxxxxx.xxx/duply
# perl -MProc::ProcessTable -MData::Dumper -Mstrict -e 'my $pid = 7716; my $p = new Proc::ProcessTable(); print Dumper(grep { $_->{pid} == $pid} @{ $p->table });'
$VAR1 = bless( {
'cmajflt' => '28',
'cmndline' => '/usr/bin/python /usr/bin/duplicity --name duply_mail --encrypt-key XXXXXXXXXXXXXXXX --sign-key XXXXXXXXXXXXXXXX --verbosity 4 --full-if-older-than 15D --ssh-options=-oIdentityFile=/root/.duply/mail/id_rsa --force --exclude-filelist /root/.duply/mail/exclude / sftp://mailbkp@xxxxxxxxxxxxx.xxx/duply',
'time' => '20070000',
'euid' => 0,
'fuid' => 0,
'state' => 'sleep',
'fname' => 'duplicity',
'ctime' => '1040000',
'rss' => '44040192',
'pid' => 7716,
'ppid' => 7372,
'stime' => '1760000',
'pctmem' => '0.53',
'egid' => 0,
'cstime' => '40000',
'sgid' => 0,
'pctcpu' => ' 1.94',
'cutime' => '1000000',
'priority' => 20,
'exec' => '/usr/bin/python2.7',
'ttynum' => 0,
'suid' => 0,
'wchan' => '0',
'gid' => 0,
'start' => '1514329298',
'size' => '257503232',
'pgrp' => 7370,
'utime' => '18310000',
'uid' => 0,
'ttydev' => '',
'cwd' => '/root',
'minflt' => '53436',
'cminflt' => '2376',
'flags' => '1077936384',
'sess' => 7370,
'majflt' => '27',
'fgid' => 0
}, 'Proc::ProcessTable::Process' );
You were correct, ps u
yield incorrect futuristic start times when run inside the container. Sorry for the false information there.
I'm using duply for backups, inside LXC containers. I run Debian stable.
When I run needrestart inside the LXC container, I get no false positive.
But when I run it from the hypervisor, it wants to restart the virtual machine complaining about files in something like
/proc/25192/root//usr/lib/python2.7/
.Here are the logs: needrestart.hyper.log needrestart.lxc.log