Enhance process metrics to avoid usage of ps tool

rspreafico commented 7 years ago

Hi,

I am using Nextflow with Biocontainers, which are gaining increasing traction (link1, link2). Biocontainers use a minimal BusyBox. That caused issues with coreutils that were previously fixed in Issue #321. I am now noticing another error in the .command.err log for any process started by Nextflow that leverages a Biocontainer:

ps: bad -o argument 'state', supported arguments: user,group,comm,args,pid,ppid,pgid,tty,vsz,stat,rss

It doesn't seem to be critical as processes still complete successfully, but I wonder whether this could still be fixed to get full BusyBox/Biocontainers support?

Thanks in advance.

pditommaso commented 7 years ago

NF uses basic GNU utils such as ps, sed, egrep, awk, date when enabling performance tracing, timeline and execution reports.

Beging biocontainers built on busybox some of those tools are not available causing that error message. There's no easy solution to that other than providing those utilities in the target container.

rspreafico commented 7 years ago

Yes that's right. BusyBox does have all those tools though, however they are a more basic version than the GNU version that ships with Linux distros. I find that options for each util that tend to be conserved across Linux, Mac and BSD tend to be conserved in BusyBox too. Linux-specific options are typically not seen in BusyBox. So the issue seems to be to restrict the usage of these utils to the most basic options. I talked with the Biocontainers developers about adding coreutils, but they are concerned of the extra 10 mb layer added to each and every util. Since both Nextflow and Biocontainers are very popular bioinformatics tools/projects, I was hoping they could play seemlessly together. On the Nextflow side, it seems that the only residual error is with an unrecognized option of ps - is there any way to work around this? Thanks in advance.

pditommaso commented 7 years ago

The flag state could be replaced by stat but there isn't an alternative for pmem and pcpu. Moreover in this in Biocontainers is not available /bin/bash either. Which is required to run NF scripts .

rspreafico commented 7 years ago

Biocontainers seem to have bash, for example:

docker run -it quay.io/biocontainers/samtools:1.5--1 /bin/bash
bash-4.2#

Of note, I was able to successfully run a full Nextflow pipeline with >15 processes using >10 different Biocontainers. The output was comparable to the same pipeline using internally generated, Ubuntu-based containers. The error that I reported at the beginning of this thread with ps didn't seem so critical to stop the pipeline. Because bash is present, it goes through. I don't know whether the absence of pmem and pcpu is a stopper. It seems Nextflow is quite close to running Biocontainers without issues.

pditommaso commented 7 years ago

It should be understood if bash is consistently added to all Biocontainers, because it's not included by default in Busybox.

~~Then it turns out the the psedo file /proc/<pid>/io is not available in Busybox. I have no clue why, however NF uses it to retried input/output metrics.~~

Recapitulating, it would be possible to replace state with stat, but there isn't an alternative for pmem, pcpu.

pditommaso commented 7 years ago

Do you think Biocontainers maintainers would take in consideration to add just the procps package ?

rspreafico commented 7 years ago

Yes, I think they are using a bash-empowered BusyBox container as the base for all Biocontainers. As for procps, let me reference this issue with them and see what they say.

rspreafico commented 7 years ago

@pditommaso In the Biocontainers Github issue referenced above, @bgruening suggests to collect metrics using Docker's cgroups as described here. Would that work?

pditommaso commented 7 years ago

Any update on this?

rspreafico commented 6 years ago

Will do the PR suggested at the issue reference above next week. Will send updates as soon as ready.

rspreafico commented 6 years ago

Just created a PR to address this issue.

pditommaso commented 6 years ago

There could be there possible alternative to this problem in the case an update ps tool is not include in the biocontainers base image:

1) Retrieve the process pcpu and pmem usage (cpu and memory percentage) from the pseudo file system. However these information are not directly available but it should be possible to derive pcpu from the file /proc/<pid>/stat as explained here. The required math could be implemented by using awk. Still not clear how to retrieve the pmem. Pros: minor changes in the current code. Cons: different implementations depending the version of the ps, can be difficult to troubleshoot potential bugs. It won't work in Mac, because the container runs in the embedded VM.

2) Run the background metrics collector process outside the task container. This is straightforward for singularity containers. For containers run via Docker a solution could be to retrieve the task pid run by the docker daemon and use it as root process tree to calculate the metrics. Pros: allows the reuse of existing code; allow the deprecation of the intermediate wrapper script to collect the metrics. Cons: requires to change the container run logic to get the container ID before it's executed. It won't work in Mac, because the container runs in the embedded VM.

3) As before but using cgrops and docker stats command. Pros: allow the deprecation of the intermediate wrapper script to collect the metrics. Cons: requires to change the container run logic to get the container ID before it's executed; different implementations for docker and singularity can produce different results and can be difficult to troubleshoot potential bugs.

4) Save the container process pid to a named pipe, then a background external process fetch it and collect the metrics. Pros: minor changes in the current code. Easy to sync with the target container process. Cons: it won't work on Mac, because the container runs in the embedded VM.

pditommaso commented 6 years ago

(removed and copied in the above post)

pditommaso commented 6 years ago

I've tried also to fallback to top using this command

top -b -n 1 | awk -v PID=$1 '$1==PID { print $1,$4,$6,$7,$5,0 }'

but the column format is not stable and change across different distributions, hence it cannot be used.

pditommaso commented 6 years ago

The solution would be to fallback on the information provided by /proc/[pid]/stat. From this page:

 /proc/[pid]/stat
              Status information about the process.  This is used by ps(1).
              It is defined in the kernel source file fs/proc/array.c.

              The fields, in order, with their proper scanf(3) format speci‐
              fiers, are listed below.  Whether or not certain of these
              fields display valid information is governed by a ptrace
              access mode PTRACE_MODE_READ_FSCREDS | PTRACE_MODE_NOAUDIT
              check (refer to ptrace(2)).  If the check denies access, then
              the field value is displayed as 0.  The affected fields are
              indicated with the marking [PT].

              (1) pid  %d
                        The process ID.

              (2) comm  %s
                        The filename of the executable, in parentheses.
                        This is visible whether or not the executable is
                        swapped out.

              (3) state  %c
                        One of the following characters, indicating process
                        state:

                        R  Running

                        S  Sleeping in an interruptible wait

                        D  Waiting in uninterruptible disk sleep

                        Z  Zombie

                        T  Stopped (on a signal) or (before Linux 2.6.33)
                           trace stopped

                        t  Tracing stop (Linux 2.6.33 onward)

                        W  Paging (only before Linux 2.6.0)

                        X  Dead (from Linux 2.6.0 onward)

                        x  Dead (Linux 2.6.33 to 3.13 only)

                        K  Wakekill (Linux 2.6.33 to 3.13 only)

                        W  Waking (Linux 2.6.33 to 3.13 only)

                        P  Parked (Linux 3.9 to 3.13 only)

              (4) ppid  %d
                        The PID of the parent of this process.

              (5) pgrp  %d
                        The process group ID of the process.

              (6) session  %d
                        The session ID of the process.

              (7) tty_nr  %d
                        The controlling terminal of the process.  (The minor
                        device number is contained in the combination of
                        bits 31 to 20 and 7 to 0; the major device number is
                        in bits 15 to 8.)

              (8) tpgid  %d
                        The ID of the foreground process group of the con‐
                        trolling terminal of the process.

              (9) flags  %u
                        The kernel flags word of the process.  For bit mean‐
                        ings, see the PF_* defines in the Linux kernel
                        source file include/linux/sched.h.  Details depend
                        on the kernel version.

                        The format for this field was %lu before Linux 2.6.

              (10) minflt  %lu
                        The number of minor faults the process has made
                        which have not required loading a memory page from
                        disk.

              (11) cminflt  %lu
                        The number of minor faults that the process's
                        waited-for children have made.

              (12) majflt  %lu
                        The number of major faults the process has made
                        which have required loading a memory page from disk.

              (13) cmajflt  %lu
                        The number of major faults that the process's
                        waited-for children have made.

              (14) utime  %lu
                        Amount of time that this process has been scheduled
                        in user mode, measured in clock ticks (divide by
                        sysconf(_SC_CLK_TCK)).  This includes guest time,
                        guest_time (time spent running a virtual CPU, see
                        below), so that applications that are not aware of
                        the guest time field do not lose that time from
                        their calculations.

              (15) stime  %lu
                        Amount of time that this process has been scheduled
                        in kernel mode, measured in clock ticks (divide by
                        sysconf(_SC_CLK_TCK)).

              (16) cutime  %ld
                        Amount of time that this process's waited-for chil‐
                        dren have been scheduled in user mode, measured in
                        clock ticks (divide by sysconf(_SC_CLK_TCK)).  (See
                        also times(2).)  This includes guest time,
                        cguest_time (time spent running a virtual CPU, see
                        below).

              (17) cstime  %ld
                        Amount of time that this process's waited-for chil‐
                        dren have been scheduled in kernel mode, measured in
                        clock ticks (divide by sysconf(_SC_CLK_TCK)).

              (18) priority  %ld
                        (Explanation for Linux 2.6) For processes running a
                        real-time scheduling policy (policy below; see
                        sched_setscheduler(2)), this is the negated schedul‐
                        ing priority, minus one; that is, a number in the
                        range -2 to -100, corresponding to real-time priori‐
                        ties 1 to 99.  For processes running under a non-
                        real-time scheduling policy, this is the raw nice
                        value (setpriority(2)) as represented in the kernel.
                        The kernel stores nice values as numbers in the
                        range 0 (high) to 39 (low), corresponding to the
                        user-visible nice range of -20 to 19.

                        Before Linux 2.6, this was a scaled value based on
                        the scheduler weighting given to this process.

              (19) nice  %ld
                        The nice value (see setpriority(2)), a value in the
                        range 19 (low priority) to -20 (high priority).

              (20) num_threads  %ld
                        Number of threads in this process (since Linux 2.6).
                        Before kernel 2.6, this field was hard coded to 0 as
                        a placeholder for an earlier removed field.

              (21) itrealvalue  %ld
                        The time in jiffies before the next SIGALRM is sent
                        to the process due to an interval timer.  Since ker‐
                        nel 2.6.17, this field is no longer maintained, and
                        is hard coded as 0.

              (22) starttime  %llu
                        The time the process started after system boot.  In
                        kernels before Linux 2.6, this value was expressed
                        in jiffies.  Since Linux 2.6, the value is expressed
                        in clock ticks (divide by sysconf(_SC_CLK_TCK)).

                        The format for this field was %lu before Linux 2.6.

              (23) vsize  %lu
                        Virtual memory size in bytes.

              (24) rss  %ld
                        Resident Set Size: number of pages the process has
                        in real memory.  This is just the pages which count
                        toward text, data, or stack space.  This does not
                        include pages which have not been demand-loaded in,
                        or which are swapped out.

              (25) rsslim  %lu
                        Current soft limit in bytes on the rss of the
                        process; see the description of RLIMIT_RSS in
                        getrlimit(2).

              (26) startcode  %lu  [PT]
                        The address above which program text can run.

              (27) endcode  %lu  [PT]
                        The address below which program text can run.

              (28) startstack  %lu  [PT]
                        The address of the start (i.e., bottom) of the
                        stack.

              (29) kstkesp  %lu  [PT]
                        The current value of ESP (stack pointer), as found
                        in the kernel stack page for the process.

              (30) kstkeip  %lu  [PT]
                        The current EIP (instruction pointer).

              (31) signal  %lu
                        The bitmap of pending signals, displayed as a deci‐
                        mal number.  Obsolete, because it does not provide
                        information on real-time signals; use
                        /proc/[pid]/status instead.

              (32) blocked  %lu
                        The bitmap of blocked signals, displayed as a deci‐
                        mal number.  Obsolete, because it does not provide
                        information on real-time signals; use
                        /proc/[pid]/status instead.

              (33) sigignore  %lu
                        The bitmap of ignored signals, displayed as a deci‐
                        mal number.  Obsolete, because it does not provide
                        information on real-time signals; use
                        /proc/[pid]/status instead.

              (34) sigcatch  %lu
                        The bitmap of caught signals, displayed as a decimal
                        number.  Obsolete, because it does not provide
                        information on real-time signals; use
                        /proc/[pid]/status instead.

              (35) wchan  %lu  [PT]
                        This is the "channel" in which the process is wait‐
                        ing.  It is the address of a location in the kernel
                        where the process is sleeping.  The corresponding
                        symbolic name can be found in /proc/[pid]/wchan.

              (36) nswap  %lu
                        Number of pages swapped (not maintained).

              (37) cnswap  %lu
                        Cumulative nswap for child processes (not main‐
                        tained).

              (38) exit_signal  %d  (since Linux 2.1.22)
                        Signal to be sent to parent when we die.

              (39) processor  %d  (since Linux 2.2.8)
                        CPU number last executed on.

              (40) rt_priority  %u  (since Linux 2.5.19)
                        Real-time scheduling priority, a number in the range
                        1 to 99 for processes scheduled under a real-time
                        policy, or 0, for non-real-time processes (see
                        sched_setscheduler(2)).

              (41) policy  %u  (since Linux 2.5.19)
                        Scheduling policy (see sched_setscheduler(2)).
                        Decode using the SCHED_* constants in linux/sched.h.

                        The format for this field was %lu before Linux
                        2.6.22.

              (42) delayacct_blkio_ticks  %llu  (since Linux 2.6.18)
                        Aggregated block I/O delays, measured in clock ticks
                        (centiseconds).

              (43) guest_time  %lu  (since Linux 2.6.24)
                        Guest time of the process (time spent running a vir‐
                        tual CPU for a guest operating system), measured in
                        clock ticks (divide by sysconf(_SC_CLK_TCK)).

              (44) cguest_time  %ld  (since Linux 2.6.24)
                        Guest time of the process's children, measured in
                        clock ticks (divide by sysconf(_SC_CLK_TCK)).

              (45) start_data  %lu  (since Linux 3.3)  [PT]
                        Address above which program initialized and unini‐
                        tialized (BSS) data are placed.

              (46) end_data  %lu  (since Linux 3.3)  [PT]
                        Address below which program initialized and unini‐
                        tialized (BSS) data are placed.

              (47) start_brk  %lu  (since Linux 3.3)  [PT]
                        Address above which program heap can be expanded
                        with brk(2).

              (48) arg_start  %lu  (since Linux 3.5)  [PT]
                        Address above which program command-line arguments
                        (argv) are placed.

              (49) arg_end  %lu  (since Linux 3.5)  [PT]
                        Address below program command-line arguments (argv)
                        are placed.

              (50) env_start  %lu  (since Linux 3.5)  [PT]
                        Address above which program environment is placed.

              (51) env_end  %lu  (since Linux 3.5)  [PT]
                        Address below which program environment is placed.

              (52) exit_code  %d  (since Linux 3.5)  [PT]
                        The thread's exit status in the form reported by
                        waitpid(2).

pditommaso commented 6 years ago

The required information for each process are:

state: process status, that would be field (3)
pcpu: cpu usage percentage (?)
pmem: mem usage percentage (?)
vsz: memory virtual size, field (23)
rss: memory resident size, field (24) The same value is given by field (2) in /proc/[pid]/statm (see).

Actually it's the number of pages, not memory, check insteasd grep VmRSS /proc/<pid>/status. Moreover RSS can be (approximately+) obtained by summing the Rss: entries in smaps (you don't need to add up the shared/private shared/dirty entries): awk '/Rss:/{ sum += $2 } END { print sum }' /proc/$$/smaps see https://unix.stackexchange.com/a/33388

Finally the page size is given by: cat /proc/$$/smaps | egrep 'KernelPageSize' | head -n 1 | awk '{print $2}'

Therefore the problem is to calculate pcpu and pmem.

See also this link.

pditommaso commented 6 years ago

Other workaround, first try

# top -b -n 1 -p 1
top - 13:09:03 up 1 day, 18:08,  0 users,  load average: 0.16, 0.09, 0.13
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.4 us,  1.1 sy,  0.0 ni, 97.4 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16409880 total, 12009328 free,   839936 used,  3560616 buff/cache
KiB Swap:  1048572 total,  1048572 free,        0 used. 15064544 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0   18236   3284   2812 S   0.0  0.0   0:00.09 bash

if it fails, use:

top -b -n 1 
Mem: 4401364K used, 12008516K free, 0K shrd, 0K buff, 20291696K cached
CPU:   0% usr   0% sys   0% nic 100% idle   0% io   0% irq   0% sirq
Load average: 0.43 0.21 0.17 3/651 79
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
    1     0 root     S     6868   0%   0% /bin/sh
   78     1 root     R     4764   0%   0% [top]
   79     1 root     R     4764   0%   0% top -b -n 1

Then parse accordingly the two different formats. Note in the latter is missing the rss memory.

bgruening commented 6 years ago

Thanks @pditommaso! Really appreciate this!

pditommaso commented 6 years ago

I'm going crazy, man ;)

bgruening commented 6 years ago

I know ... but I appreciate your effort here, really! Thanks!

pditommaso commented 6 years ago

BTW, top is a no go, there are too many differences across distributions. I find a way to calc %cpu using proc stat file.

#!/bin/bash
pid=$1
prev_time=0
prev_total=0
interval=${2:-1}
num_cpus=$(cat /proc/cpuinfo | grep '^processor' -c)

while true; do
    total_time=$(grep '^cpu ' /proc/stat |awk '{sum=$2+$3+$4+$5+$6+$7+$8+$9+$10; print sum}')
    proc_time=$(cat /proc/$pid/stat | awk '{sum=$14+$15+$16+$17; print sum}')
    cpu_usage=$(echo -n $proc_time $prev_time $total_time $prev_total $num_cpus | awk '{ pct=($1-$2)/($3-$4)*$5 *100; printf "%.1f", pct }' )
    prev_time="$proc_time"
    prev_total="$total_time"
    echo $cpu_usage  
    sleep 1
done

See here and here.

Any idea how to calc the % mem instead?

carlwitt commented 6 years ago

Note that using cgroups/docker stats for monitoring memory usage is very simple and convenient, but also tricky since it covers both process and OS memory usage inside the container.

For instance, when running tar -xf on a 3GB archive in a docker container, cAdvisor reports 6GB of memory usage (!) for that container. The RSS/VSZ of the tar process is tiny (2MB/25MB), but OS's page cache (archive+extracted files=6GB) seemingly occupies the RAM.

In this case the container's memory usage varies, since the OS adapts to the available memory. When setting a memory limit of 100MB, the container uses only that much memory and also finishes successfully, but slower (50sec compared to 14sec without memory limit).

Here's a screen shot from the cAdvisor interface for the case of extracting the archive without a memory limit. cadvisor-example

pditommaso commented 6 years ago

Interesting. This is also a reason I would not like to rely on cgroup to avoid to report different metrics depending the deployment/execution platform.

pditommaso commented 6 years ago

This finally has been implemented. Please give it a try with the following command:

NXF_VER=0.30.0-RC1 nextflow run .. etc

bgruening commented 6 years ago

What what what? @pditommaso awesome!!! I own you know a beer or juice :)

pditommaso commented 6 years ago

I have been told that in Portland there are many, good, breweries 😉

bgruening commented 6 years ago

Deal! :)

rspreafico commented 6 years ago

That is awesome! Thanks @pditommaso for this! It's really a game changer and immensely useful!

pditommaso commented 6 years ago

@bgruening I'm remember I saw a chart showing the growth of Bioconda packages recently (maybe a paper?), but I'm unable to find it. Any idea if there's a link available?

bgruening commented 6 years ago

Have a look here: https://www.biorxiv.org/content/early/2017/10/21/207092

pditommaso commented 6 years ago

It was that. Thanks.

rspreafico-sgi commented 6 years ago

I tried this an it's working beautifully. Only one thing, when I get into the work folder for any process that makes use of a biocontainer and I inspect the command.err file, I noticed that there is always a failed chown call at the end, regardless of the command being run. For example, with fastqc:

Unable to find image 'quay.io/biocontainers/fastqc:0.11.7--pl5.22.0_2' locally
0.11.7--pl5.22.0_2: Pulling from biocontainers/fastqc
a3ed95caeb02: Already exists
77c6c00e8b61: Already exists
3aaade50789a: Already exists
00cf8b9f3d2a: Already exists
7ff999a2256f: Already exists
d2ba336f2e44: Already exists
dfda3e01f2b6: Already exists
a3ed95caeb02: Already exists
10c3bb32200b: Already exists
e1655b0561ce: Pulling fs layer
e1655b0561ce: Download complete
e1655b0561ce: Pull complete
Digest: sha256:790a0ce3b9d6e91ede1bfc9acc6e2abbc808bc1e505e8da70feca064a26de154
Status: Image is up to date for quay.io/biocontainers/fastqc:0.11.7--pl5.22.0_2
Started analysis of 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 5% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 10% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 15% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 20% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 25% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 30% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 35% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 40% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 45% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 50% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 55% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 60% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 65% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 70% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 75% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 80% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 85% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 90% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
Approx 95% complete for 160421-EF-M070901-mono-AOC-4h_L001_R2_trimmed.fastq.gz
chown: unrecognized option '--from'
BusyBox v1.22.1 (2014-05-23 01:24:27 UTC) multi-call binary.

Usage: chown [-RhLHPcvf]... OWNER[<.|:>[GROUP]] FILE...

Change the owner and/or group of each FILE to OWNER and/or GROUP

        -R      Recurse
        -h      Affect symlinks instead of symlink targets
        -L      Traverse all symlinks to directories
        -H      Traverse symlinks on command line only
        -P      Don't traverse symlinks (default)
        -c      List changed files
        -v      List all files
        -f      Hide errors

This does not occur when using tools called from internal Ubuntu-based containers.

rspreafico-sgi commented 6 years ago

On the plus end, the pipeline still completes successfully and all expected output files are found.

pditommaso commented 6 years ago

NF does not use chown to fetch process metrics. It can be used by the fixOwnership option, tho.

rspreafico-sgi commented 6 years ago

Yep, unrelated to process metrics, the link is GNU vs non-GNU tools. That's right, I have the fixOwnership option on. Seems the issue is with the --from flag of chown, whereas -fR are more universally supported. Is --from required? Would it be ok to chown all files?

pditommaso commented 6 years ago

What's the difference between --from and -fR ?

rspreafico-sgi commented 6 years ago

-fR are unrelated to --from, apologies that I wasn't clear. They just activate silent and recursive mode, respectively. I meant to say that those two options, also used in the line that you referenced, are not an issue. The only issue arises from --from.

pditommaso commented 6 years ago

I would suggest to replace fixOwnership with docker.runOptions = '-u $(id -u):$(id -g)'

rspreafico-sgi commented 6 years ago

Yes we do that as a backup and I can go around this limitation that way. Wondering whether it may be useful to get rid of --from for all users. Other than BioContainers/BusyBox, I have just tried the Alpine Linux docker container, widely used as base image, and it lacks the --from option as well.

The two options here might be either to chown all files (they should all belong to the Nextflow user anyway), or append the UID/GID Docker options that you referenced if absent, but fixOwnership is on.

pditommaso commented 6 years ago

Wondering whether it may be useful to get rid of --from for all users.

It could provide there's an equivalent syntax. However we are off-topic from this thread. I would suggest to open a new one.

rspreafico-sgi commented 6 years ago

With Nextflow 0.30.2 I am getting

/Users/rspreafico/workspace/assets/meta_test/work/3b/78a07d705545562fd430231edd41af/.command.stub: line 45: ps: command not found

repeated several times in the .command.log file. I am using this container, which does not provide ps.

pditommaso commented 6 years ago

The goal of this issue was to make NF compatible with the biocontainers stock image which includes a ps command with limited capabilities. However the ps is still needed to determine the process tree, therefore it should be included in the base image to being able to collect metrics .

rspreafico-sgi commented 6 years ago

Gotcha! Will make a derivative image from that one and add ps then! Thanks for clarifying!

nextflow-io / nextflow

Enhance process metrics to avoid usage of ps tool #499