nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.7k stars 621 forks source link

Sharing intermediate results (workDir) between users on a shared machine #1100

Closed bebosudo closed 5 years ago

bebosudo commented 5 years ago

Hi, at my company we are considering using nextflow for a scientific application development. We had to wrap it in a script to increase some of its functions (e.g. to be able to record statistics of runs we created a logger that runs in background and logs to a db the run statistics, because there is no information about the git pipeline being executed by NF when using the weblog option, etc).

We now setup NF version 19.01.0 on a test machine that our users use to submit jobs to AWS Batch.

Being able to share intermediate results/computations (the ones saved in workDir) between users, would make NextFlow a really useful tool. I've noticed that it's not enough sharing the same workDir directory between users (doesn't work both locally nor on AWS S3), and there's something contained in ~/.nextflow/cache that allows to resume a process. Unfortunately, sharing also ~/.nextflow/ or ~/.nextflow/cache/ doesn't make NF able to resume other people's jobs.

Example:

# mkdir -p --mode=g+rwxs /shared/.nextflow/cache /shared/work
# chgrp -R users /shared/
# mkdir /etc/skel/.nextflow
# ln -s /shared/.nextflow/cache /etc/skel/.nextflow/

# useradd -g users demo1
# useradd -g users demo2
# su - demo1
[demo1@host]$ nextflow run hello -resume -w /shared/work/
Launching `nextflow-io/hello` [extravagant_cantor] - revision: a9012339ce [master]
..
[d9/f38f53] Submitted process > sayHello (2)

[demo1@host]$ nextflow run hello -resume -w /shared/work/
[d9/f38f53] Cached process > sayHello (2)

Now with the second user:

[demo2@host]$ nextflow run hello -resume -w /shared/work/
[ad/7d2252] Submitted process > sayHello (2)
# ll /shared/.nextflow/cache/
drwxrwxr-x+ 3 demo1 users 47 Apr  4 09:39 7c4de1cf-a7a5-4d4b-9e1e-4edcc215661e/
drwxrwxr-x+ 3 demo2 users 45 Apr  4 09:43 7e3d2b3a-c158-4508-a2eb-1f943a77d883/

I saw there are some hashes in .nextflow/cache/. Is the username used to build the hash, which causes NF to not recognize pipelines executed by other users? Having the possibility to "recycle" other users computations could make us decide to use NF in this setup.

pditommaso commented 5 years ago

Being able to share intermediate results/computations (the ones saved in workDir) between users, would make NextFlow a really useful tool.

This is generally achieved using a shared file system such as NFS, Luste, etc.

bebosudo commented 5 years ago

Hi @pditommaso, thanks for the reply. This is a single server, and all the users login to this machine to submit jobs to AWS Batch.

Since it's a single machine, I created a shared work/ directory in /shared/, which is a dir with the same group of the users submitting, but that's not enough to be able to "recycle" jobs. After some digging I discovered that the knowledge of "past jobs" that have been executed is stored in ~/.nextflow/cache/ (if I remove it, the same user is unable to resume a job: try nextflow run hello; rm -rf ~/.nextflow/cache/; nextflow run hello -resume, and the second job is repeated). So I shared also ~/.nextflow/cache/, creating a link to /shared/.nextflow/cache/, but this still doesn't allow another user to resume someone else's jobs. Hope it's clearer now. Any idea on how to resume other users' jobs?

pditommaso commented 5 years ago

The workflow local cache folder is not designed to be sharable, therefore can't support this use case.