Closed rspreafico closed 6 years ago
NF uses basic GNU utils such as ps
, sed
, egrep
, awk
, date
when enabling performance tracing, timeline and execution reports.
Beging biocontainers built on busybox some of those tools are not available causing that error message. There's no easy solution to that other than providing those utilities in the target container.
Yes that's right. BusyBox does have all those tools though, however they are a more basic version than the GNU version that ships with Linux distros. I find that options for each util that tend to be conserved across Linux, Mac and BSD tend to be conserved in BusyBox too. Linux-specific options are typically not seen in BusyBox. So the issue seems to be to restrict the usage of these utils to the most basic options. I talked with the Biocontainers developers about adding coreutils, but they are concerned of the extra 10 mb layer added to each and every util. Since both Nextflow and Biocontainers are very popular bioinformatics tools/projects, I was hoping they could play seemlessly together. On the Nextflow side, it seems that the only residual error is with an unrecognized option of ps
- is there any way to work around this? Thanks in advance.
The flag state
could be replaced by stat
but there isn't an alternative for pmem
and pcpu
. Moreover in this in Biocontainers is not available /bin/bash
either. Which is required to run NF scripts .
Biocontainers seem to have bash, for example:
docker run -it quay.io/biocontainers/samtools:1.5--1 /bin/bash
bash-4.2#
Of note, I was able to successfully run a full Nextflow pipeline with >15 processes using >10 different Biocontainers. The output was comparable to the same pipeline using internally generated, Ubuntu-based containers. The error that I reported at the beginning of this thread with ps
didn't seem so critical to stop the pipeline. Because bash
is present, it goes through. I don't know whether the absence of pmem
and pcpu
is a stopper. It seems Nextflow is quite close to running Biocontainers without issues.
It should be understood if bash
is consistently added to all Biocontainers, because it's not included by default in Busybox.
Then it turns out the the psedo file /proc/<pid>/io
is not available in Busybox. I have no clue why, however NF uses it to retried input/output metrics.
Recapitulating, it would be possible to replace state
with stat
, but there isn't an alternative for pmem
, pcpu
.
Do you think Biocontainers maintainers would take in consideration to add just the procps
package ?
Yes, I think they are using a bash-empowered BusyBox container as the base for all Biocontainers. As for procps
, let me reference this issue with them and see what they say.
@pditommaso In the Biocontainers Github issue referenced above, @bgruening suggests to collect metrics using Docker's cgroups as described here. Would that work?
Any update on this?
Will do the PR suggested at the issue reference above next week. Will send updates as soon as ready.
Just created a PR to address this issue.
There could be there possible alternative to this problem in the case an update ps
tool is not include in the biocontainers base image:
1) Retrieve the process pcpu
and pmem
usage (cpu and memory percentage) from the pseudo file system. However these information are not directly available but it should be possible to derive pcpu
from the file /proc/<pid>/stat
as explained here. The required math could be implemented by using awk. Still not clear how to retrieve the pmem
.
Pros: minor changes in the current code. Cons: different implementations depending the version of the ps
, can be difficult to troubleshoot potential bugs. It won't work in Mac, because the container runs in the embedded VM.
2) Run the background metrics collector process outside the task container. This is straightforward for singularity containers. For containers run via Docker a solution could be to retrieve the task pid run by the docker daemon and use it as root process tree to calculate the metrics. Pros: allows the reuse of existing code; allow the deprecation of the intermediate wrapper script to collect the metrics. Cons: requires to change the container run logic to get the container ID before it's executed. It won't work in Mac, because the container runs in the embedded VM.
3) As before but using cgrops and docker stats
command. Pros: allow the deprecation of the intermediate wrapper script to collect the metrics. Cons: requires to change the container run logic to get the container ID before it's executed; different implementations for docker and singularity can produce different results and can be difficult to troubleshoot potential bugs.
4) Save the container process pid to a named pipe, then a background external process fetch it and collect the metrics. Pros: minor changes in the current code. Easy to sync with the target container process. Cons: it won't work on Mac, because the container runs in the embedded VM.
(removed and copied in the above post)
I've tried also to fallback to top
using this command
top -b -n 1 | awk -v PID=$1 '$1==PID { print $1,$4,$6,$7,$5,0 }'
but the column format is not stable and change across different distributions, hence it cannot be used.
The solution would be to fallback on the information provided by /proc/[pid]/stat
. From this page:
/proc/[pid]/stat
Status information about the process. This is used by ps(1).
It is defined in the kernel source file fs/proc/array.c.
The fields, in order, with their proper scanf(3) format speci‐
fiers, are listed below. Whether or not certain of these
fields display valid information is governed by a ptrace
access mode PTRACE_MODE_READ_FSCREDS | PTRACE_MODE_NOAUDIT
check (refer to ptrace(2)). If the check denies access, then
the field value is displayed as 0. The affected fields are
indicated with the marking [PT].
(1) pid %d
The process ID.
(2) comm %s
The filename of the executable, in parentheses.
This is visible whether or not the executable is
swapped out.
(3) state %c
One of the following characters, indicating process
state:
R Running
S Sleeping in an interruptible wait
D Waiting in uninterruptible disk sleep
Z Zombie
T Stopped (on a signal) or (before Linux 2.6.33)
trace stopped
t Tracing stop (Linux 2.6.33 onward)
W Paging (only before Linux 2.6.0)
X Dead (from Linux 2.6.0 onward)
x Dead (Linux 2.6.33 to 3.13 only)
K Wakekill (Linux 2.6.33 to 3.13 only)
W Waking (Linux 2.6.33 to 3.13 only)
P Parked (Linux 3.9 to 3.13 only)
(4) ppid %d
The PID of the parent of this process.
(5) pgrp %d
The process group ID of the process.
(6) session %d
The session ID of the process.
(7) tty_nr %d
The controlling terminal of the process. (The minor
device number is contained in the combination of
bits 31 to 20 and 7 to 0; the major device number is
in bits 15 to 8.)
(8) tpgid %d
The ID of the foreground process group of the con‐
trolling terminal of the process.
(9) flags %u
The kernel flags word of the process. For bit mean‐
ings, see the PF_* defines in the Linux kernel
source file include/linux/sched.h. Details depend
on the kernel version.
The format for this field was %lu before Linux 2.6.
(10) minflt %lu
The number of minor faults the process has made
which have not required loading a memory page from
disk.
(11) cminflt %lu
The number of minor faults that the process's
waited-for children have made.
(12) majflt %lu
The number of major faults the process has made
which have required loading a memory page from disk.
(13) cmajflt %lu
The number of major faults that the process's
waited-for children have made.
(14) utime %lu
Amount of time that this process has been scheduled
in user mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK)). This includes guest time,
guest_time (time spent running a virtual CPU, see
below), so that applications that are not aware of
the guest time field do not lose that time from
their calculations.
(15) stime %lu
Amount of time that this process has been scheduled
in kernel mode, measured in clock ticks (divide by
sysconf(_SC_CLK_TCK)).
(16) cutime %ld
Amount of time that this process's waited-for chil‐
dren have been scheduled in user mode, measured in
clock ticks (divide by sysconf(_SC_CLK_TCK)). (See
also times(2).) This includes guest time,
cguest_time (time spent running a virtual CPU, see
below).
(17) cstime %ld
Amount of time that this process's waited-for chil‐
dren have been scheduled in kernel mode, measured in
clock ticks (divide by sysconf(_SC_CLK_TCK)).
(18) priority %ld
(Explanation for Linux 2.6) For processes running a
real-time scheduling policy (policy below; see
sched_setscheduler(2)), this is the negated schedul‐
ing priority, minus one; that is, a number in the
range -2 to -100, corresponding to real-time priori‐
ties 1 to 99. For processes running under a non-
real-time scheduling policy, this is the raw nice
value (setpriority(2)) as represented in the kernel.
The kernel stores nice values as numbers in the
range 0 (high) to 39 (low), corresponding to the
user-visible nice range of -20 to 19.
Before Linux 2.6, this was a scaled value based on
the scheduler weighting given to this process.
(19) nice %ld
The nice value (see setpriority(2)), a value in the
range 19 (low priority) to -20 (high priority).
(20) num_threads %ld
Number of threads in this process (since Linux 2.6).
Before kernel 2.6, this field was hard coded to 0 as
a placeholder for an earlier removed field.
(21) itrealvalue %ld
The time in jiffies before the next SIGALRM is sent
to the process due to an interval timer. Since ker‐
nel 2.6.17, this field is no longer maintained, and
is hard coded as 0.
(22) starttime %llu
The time the process started after system boot. In
kernels before Linux 2.6, this value was expressed
in jiffies. Since Linux 2.6, the value is expressed
in clock ticks (divide by sysconf(_SC_CLK_TCK)).
The format for this field was %lu before Linux 2.6.
(23) vsize %lu
Virtual memory size in bytes.
(24) rss %ld
Resident Set Size: number of pages the process has
in real memory. This is just the pages which count
toward text, data, or stack space. This does not
include pages which have not been demand-loaded in,
or which are swapped out.
(25) rsslim %lu
Current soft limit in bytes on the rss of the
process; see the description of RLIMIT_RSS in
getrlimit(2).
(26) startcode %lu [PT]
The address above which program text can run.
(27) endcode %lu [PT]
The address below which program text can run.
(28) startstack %lu [PT]
The address of the start (i.e., bottom) of the
stack.
(29) kstkesp %lu [PT]
The current value of ESP (stack pointer), as found
in the kernel stack page for the process.
(30) kstkeip %lu [PT]
The current EIP (instruction pointer).
(31) signal %lu
The bitmap of pending signals, displayed as a deci‐
mal number. Obsolete, because it does not provide
information on real-time signals; use
/proc/[pid]/status instead.
(32) blocked %lu
The bitmap of blocked signals, displayed as a deci‐
mal number. Obsolete, because it does not provide
information on real-time signals; use
/proc/[pid]/status instead.
(33) sigignore %lu
The bitmap of ignored signals, displayed as a deci‐
mal number. Obsolete, because it does not provide
information on real-time signals; use
/proc/[pid]/status instead.
(34) sigcatch %lu
The bitmap of caught signals, displayed as a decimal
number. Obsolete, because it does not provide
information on real-time signals; use
/proc/[pid]/status instead.
(35) wchan %lu [PT]
This is the "channel" in which the process is wait‐
ing. It is the address of a location in the kernel
where the process is sleeping. The corresponding
symbolic name can be found in /proc/[pid]/wchan.
(36) nswap %lu
Number of pages swapped (not maintained).
(37) cnswap %lu
Cumulative nswap for child processes (not main‐
tained).
(38) exit_signal %d (since Linux 2.1.22)
Signal to be sent to parent when we die.
(39) processor %d (since Linux 2.2.8)
CPU number last executed on.
(40) rt_priority %u (since Linux 2.5.19)
Real-time scheduling priority, a number in the range
1 to 99 for processes scheduled under a real-time
policy, or 0, for non-real-time processes (see
sched_setscheduler(2)).
(41) policy %u (since Linux 2.5.19)
Scheduling policy (see sched_setscheduler(2)).
Decode using the SCHED_* constants in linux/sched.h.
The format for this field was %lu before Linux
2.6.22.
(42) delayacct_blkio_ticks %llu (since Linux 2.6.18)
Aggregated block I/O delays, measured in clock ticks
(centiseconds).
(43) guest_time %lu (since Linux 2.6.24)
Guest time of the process (time spent running a vir‐
tual CPU for a guest operating system), measured in
clock ticks (divide by sysconf(_SC_CLK_TCK)).
(44) cguest_time %ld (since Linux 2.6.24)
Guest time of the process's children, measured in
clock ticks (divide by sysconf(_SC_CLK_TCK)).
(45) start_data %lu (since Linux 3.3) [PT]
Address above which program initialized and unini‐
tialized (BSS) data are placed.
(46) end_data %lu (since Linux 3.3) [PT]
Address below which program initialized and unini‐
tialized (BSS) data are placed.
(47) start_brk %lu (since Linux 3.3) [PT]
Address above which program heap can be expanded
with brk(2).
(48) arg_start %lu (since Linux 3.5) [PT]
Address above which program command-line arguments
(argv) are placed.
(49) arg_end %lu (since Linux 3.5) [PT]
Address below program command-line arguments (argv)
are placed.
(50) env_start %lu (since Linux 3.5) [PT]
Address above which program environment is placed.
(51) env_end %lu (since Linux 3.5) [PT]
Address below which program environment is placed.
(52) exit_code %d (since Linux 3.5) [PT]
The thread's exit status in the form reported by
waitpid(2).
The required information for each process are:
rss: memory resident size, field (24)
The same value is given by field (2) in /proc/[pid]/statm
(see).
Actually it's the number of pages, not memory, check insteasd grep VmRSS /proc/<pid>/status
.
Moreover RSS can be (approximately+) obtained by summing the Rss: entries in smaps (you don't need to add up the shared/private shared/dirty entries):
awk '/Rss:/{ sum += $2 } END { print sum }' /proc/$$/smaps
see https://unix.stackexchange.com/a/33388
Finally the page size is given by:
cat /proc/$$/smaps | egrep 'KernelPageSize' | head -n 1 | awk '{print $2}'
Therefore the problem is to calculate pcpu
and pmem
.
See also this link.
Other workaround, first try
# top -b -n 1 -p 1
top - 13:09:03 up 1 day, 18:08, 0 users, load average: 0.16, 0.09, 0.13
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.4 us, 1.1 sy, 0.0 ni, 97.4 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16409880 total, 12009328 free, 839936 used, 3560616 buff/cache
KiB Swap: 1048572 total, 1048572 free, 0 used. 15064544 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 18236 3284 2812 S 0.0 0.0 0:00.09 bash
if it fails, use:
top -b -n 1
Mem: 4401364K used, 12008516K free, 0K shrd, 0K buff, 20291696K cached
CPU: 0% usr 0% sys 0% nic 100% idle 0% io 0% irq 0% sirq
Load average: 0.43 0.21 0.17 3/651 79
PID PPID USER STAT VSZ %VSZ %CPU COMMAND
1 0 root S 6868 0% 0% /bin/sh
78 1 root R 4764 0% 0% [top]
79 1 root R 4764 0% 0% top -b -n 1
Then parse accordingly the two different formats. Note in the latter is missing the rss
memory.
Thanks @pditommaso! Really appreciate this!
I'm going crazy, man ;)
I know ... but I appreciate your effort here, really! Thanks!
BTW, top
is a no go, there are too many differences across distributions. I find a way to calc %cpu
using proc stat file.
#!/bin/bash
pid=$1
prev_time=0
prev_total=0
interval=${2:-1}
num_cpus=$(cat /proc/cpuinfo | grep '^processor' -c)
while true; do
total_time=$(grep '^cpu ' /proc/stat |awk '{sum=$2+$3+$4+$5+$6+$7+$8+$9+$10; print sum}')
proc_time=$(cat /proc/$pid/stat | awk '{sum=$14+$15+$16+$17; print sum}')
cpu_usage=$(echo -n $proc_time $prev_time $total_time $prev_total $num_cpus | awk '{ pct=($1-$2)/($3-$4)*$5 *100; printf "%.1f", pct }' )
prev_time="$proc_time"
prev_total="$total_time"
echo $cpu_usage
sleep 1
done
Any idea how to calc the % mem
instead?
Note that using cgroups/docker stats for monitoring memory usage is very simple and convenient, but also tricky since it covers both process and OS memory usage inside the container.
For instance, when running tar -xf
on a 3GB archive in a docker container, cAdvisor reports 6GB of memory usage (!) for that container. The RSS/VSZ of the tar process is tiny (2MB/25MB), but OS's page cache (archive+extracted files=6GB) seemingly occupies the RAM.
In this case the container's memory usage varies, since the OS adapts to the available memory. When setting a memory limit of 100MB, the container uses only that much memory and also finishes successfully, but slower (50sec compared to 14sec without memory limit).
Here's a screen shot from the cAdvisor interface for the case of extracting the archive without a memory limit.
Interesting. This is also a reason I would not like to rely on cgroup to avoid to report different metrics depending the deployment/execution platform.
This finally has been implemented. Please give it a try with the following command:
NXF_VER=0.30.0-RC1 nextflow run .. etc
What what what? @pditommaso awesome!!! I own you know a beer or juice :)
I have been told that in Portland there are many, good, breweries 😉
Deal! :)
That is awesome! Thanks @pditommaso for this! It's really a game changer and immensely useful!
@bgruening I'm remember I saw a chart showing the growth of Bioconda packages recently (maybe a paper?), but I'm unable to find it. Any idea if there's a link available?
Have a look here: https://www.biorxiv.org/content/early/2017/10/21/207092
It was that. Thanks.
I tried this an it's working beautifully. Only one thing, when I get into the work folder for any process that makes use of a biocontainer and I inspect the command.err
file, I noticed that there is always a failed chown
call at the end, regardless of the command being run. For example, with fastqc
:
Unable to find image 'quay.io/biocontainers/fastqc:0.11.7--pl5.22.0_2' locally
0.11.7--pl5.22.0_2: Pulling from biocontainers/fastqc
a3ed95caeb02: Already exists
77c6c00e8b61: Already exists
3aaade50789a: Already exists
00cf8b9f3d2a: Already exists
7ff999a2256f: Already exists
d2ba336f2e44: Already exists
dfda3e01f2b6: Already exists
a3ed95caeb02: Already exists
10c3bb32200b: Already exists
e1655b0561ce: Pulling fs layer
e1655b0561ce: Download complete
e1655b0561ce: Pull complete
Digest: sha256:790a0ce3b9d6e91ede1bfc9acc6e2abbc808bc1e505e8da70feca064a26de154
Status: Image is up to date for quay.io/biocontainers/fastqc:0.11.7--pl5.22.0_2
Started analysis of 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 5% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 10% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 15% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 20% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 25% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 30% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 35% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 40% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 45% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 50% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 55% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 60% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 65% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 70% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 75% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 80% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 85% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 90% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 95% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
chown: unrecognized option '--from'
BusyBox v1.22.1 (2014-05-23 01:24:27 UTC) multi-call binary.
Usage: chown [-RhLHPcvf]... OWNER[<.|:>[GROUP]] FILE...
Change the owner and/or group of each FILE to OWNER and/or GROUP
-R Recurse
-h Affect symlinks instead of symlink targets
-L Traverse all symlinks to directories
-H Traverse symlinks on command line only
-P Don't traverse symlinks (default)
-c List changed files
-v List all files
-f Hide errors
This does not occur when using tools called from internal Ubuntu-based containers.
On the plus end, the pipeline still completes successfully and all expected output files are found.
NF does not use chown
to fetch process metrics. It can be used by the fixOwnership option, tho.
Yep, unrelated to process metrics, the link is GNU vs non-GNU tools. That's right, I have the fixOwnership
option on. Seems the issue is with the --from
flag of chown
, whereas -fR
are more universally supported. Is --from
required? Would it be ok to chown
all files?
What's the difference between --from
and -fR
?
-fR
are unrelated to --from
, apologies that I wasn't clear. They just activate silent and recursive mode, respectively. I meant to say that those two options, also used in the line that you referenced, are not an issue. The only issue arises from --from
.
I would suggest to replace fixOwnership
with docker.runOptions = '-u $(id -u):$(id -g)'
Yes we do that as a backup and I can go around this limitation that way. Wondering whether it may be useful to get rid of --from
for all users. Other than BioContainers/BusyBox, I have just tried the Alpine Linux docker container, widely used as base image, and it lacks the --from
option as well.
The two options here might be either to chown
all files (they should all belong to the Nextflow user anyway), or append the UID/GID Docker options that you referenced if absent, but fixOwnership
is on.
Wondering whether it may be useful to get rid of --from for all users.
It could provide there's an equivalent syntax. However we are off-topic from this thread. I would suggest to open a new one.
With Nextflow 0.30.2 I am getting
/Users/rspreafico/workspace/assets/meta_test/work/3b/78a07d705545562fd430231edd41af/.command.stub: line 45: ps: command not found
repeated several times in the .command.log
file. I am using this container, which does not provide ps
.
The goal of this issue was to make NF compatible with the biocontainers stock image which includes a ps
command with limited capabilities. However the ps
is still needed to determine the process tree, therefore it should be included in the base image to being able to collect metrics .
Gotcha! Will make a derivative image from that one and add ps
then! Thanks for clarifying!
Hi,
I am using Nextflow with Biocontainers, which are gaining increasing traction (link1, link2). Biocontainers use a minimal BusyBox. That caused issues with coreutils that were previously fixed in Issue #321. I am now noticing another error in the
.command.err
log for any process started by Nextflow that leverages a Biocontainer:It doesn't seem to be critical as processes still complete successfully, but I wonder whether this could still be fixed to get full BusyBox/Biocontainers support?
Thanks in advance.