nf-core / methylseq

Methylation (Bisulfite-Sequencing) analysis pipeline using Bismark or bwa-meth + MethylDackel
https://nf-co.re/methylseq
MIT License
138 stars 143 forks source link

Batch run for all samples in data dir #66

Closed bazyliszek closed 5 years ago

bazyliszek commented 5 years ago

I am using nf-core-methylseq-1.2, with conda environment installed from environment.yml, including java 8 in that environment. Conda Source environment is activated on local HPC.

  1. Error in calling the software versions used

    [69/77bf6d] Submitted process > get_software_versions
    [69/77bf6d] NOTE: Missing output file(s) `software_versions_mqc.yaml` expected by process `get_software_versions` -- Error is ignored

    Pipeline is still running so not big problem but would be Nice if it reports versions ….

  2. under directory data, there are many samples. The process is done only for 1 sample (R1 and R2). Why not for all? Exception for that is fastqc for which all samples are done.

nextflow run /home/user/methylseq-1.2/ -resume --reads "/WORKING/projects/DNAm_project/genome/data/ZTRX_accel1S_S13_R{1,2}_*.fastq" --outdir . --fasta "references/Homo_sapiens/UCSC/hg38/Sequence/" --bismark_index "references/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" --accel --saveTrimmed  --saveAlignedIntermediates --max_memory "128.GB" --max_cpu "8" --max_time "240.h"

Thanks

sb43 commented 5 years ago

@bazyliszek I had similar issue and it is due to two channels used in the trimgalore and alignment step. https://www.nextflow.io/docs/latest/process.html#process-understand-how-multiple-input-channels-work

set val(name), file(reads) from ch_read_files_for_trim_galore // will emit multiple read files 
file wherearemyfiles from ch_wherearemyfiles_for_trimgalore // only emit single file

Workaround is to comment the single file emitting channel , however will be waiting for proper fix.

ewels commented 5 years ago

Hi both,

Many thanks for reporting this issue - it crept in during the channel refactoring on the last release. I'll try to get a fix and a new release out ASAP.

Phil

ewels commented 5 years ago

Should now work in the dev version. Will do a release asap - hopefully once #68 passes and is merged..

bazyliszek commented 5 years ago

Thanks, the workaround did work for me after I commented out also where_are_my_files.txt for both TrimGalore and Bismark alignment .

ewels commented 5 years ago

There shouldn't be any need to edit the files now as I made the fix already. Just run with -r dev on the nextflow command. That will tell nextflow to use the development branch of the pipeline with the latest code.

bazyliszek commented 5 years ago

so, I just tried using Singularity image and -r dev version. Did not work for me. Only fastqc was generated.

[warm up] executor > local
[22/cd2afd] Submitted process > fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-005)
[a3/b83e70] Submitted process > fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-007)
[83/a10591] Submitted process > fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-001)
[79/8d4fe8] Submitted process > fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-004)
[63/e003d9] Submitted process > fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-008)
[9e/42a4cc] Submitted process > fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-006)
[06/81f7e9] Submitted process > fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-009)
[e2/502bfa] Submitted process > fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-012)
[42/e96ab1] Submitted process > fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-003)
[14/f46a6d] Submitted process > fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-002)
[70/fd84a6] Submitted process > trim_galore (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-003)
[50/e6bdab] Submitted process > get_software_versions
ERROR ~ Error executing process > 'fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-005)'

Caused by:
  Process `fastqc (170310_D00261_0393_BCAJU6ANXX_8_IL-TP-005)` terminated with an error exit status (127)

Command executed:

  fastqc -q 170310_D00261_0393_BCAJU6ANXX_8_IL-TP-005_1.fastq.gz 170310_D00261_0393_BCAJU6ANXX_8_IL-TP-005_2.fastq.gz

Command exit status:
  127

Command output:
  (empty)

Command error:
  /bin/bash: line 0: cd: /rds/projects/2016/frischd-01/Epi/work/22/cd2afdadef7848b03d5695511dedcf: No such file or directory
  /bin/bash: .command.stub: No such file or directory

Work dir:
  /rds/projects/2016/frischd-01/Epi/work/22/cd2afdadef7848b03d5695511dedcf

Tip: when you have fixed the problem you can continue the execution appending to the nextflow command line the option `-resume`

 -- Check '.nextflow.log' file for details
[nf-core/methylseq] Pipeline Complete
WARN: Killing pending tasks (1)
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
ewels commented 5 years ago

Hi @bazyliszek,

This looks like a different error - some kind of disk error where nextflow is not able to cd in to your working directory.

The fix is tested and merged in to master and I've just made a new release. So you don't need to use -r dev any more, but you'll need to update the pipeline to version 1.3 (-r 1.3).

Let me know if you're still hitting problems after this (please paste the full log including the head).

Phil

apeltzer commented 5 years ago

Can you test the singularity container is indeed capable to cd to the path on your host running the job? Should be easy to find out:

If you're now able to cd to /rds/projects/ we have some different error.

We commonly had this problem that certain custom paths (like /rds/projects/) were not loaded by Singularity by default, so we had to ask our system administrator to add them to the bind path:

# BIND PATH: [STRING]
# DEFAULT: Undefined
# Define a list of files/directories that should be made available from within
# the container. The file or directory must exist within the container on
# which to attach to. you can specify a different source and destination
# path (respectively) with a colon; otherwise source and dest are the same.
bind path = /rds/
bazyliszek commented 5 years ago

Thanks, @apeltzer. It is indeed the problem. I will stick to conda env for now, but ask system adms for fix.

bazyliszek commented 5 years ago

seems the admin does not want to do it. Can we somehow use this --bind arguments instead? And if so how? https://www.sylabs.io/guides/3.0/user-guide/bind_paths_and_mounts.html#user-defined-bind-paths

apeltzer commented 5 years ago

The administrator should be able to add this to the config file under /etc/singularity/singularity.conf

bind path = /rds

This is not a security issue at all, as it just allows the container to ACCESS the path /rds/... on the host system. Automounts would be another possibility, but that's something that some admins seem not to like.

bazyliszek commented 5 years ago

Hi, Sorry, need to re-open this one. I am on another HPC. Our Admin did looked at it and binding path did not work as advertised ... It looks like that folders under work/ are created but these are write-protected directories. Is that someting to do with the singularity image?

Admin changed also this:

# MOUNT HOSTFS: [BOOL]
# DEFAULT: no
# Probe for all mounted file systems that are mounted on the host, and bind
# those into the container?
mount hostfs = yes

# ENABLE OVERLAY: [yes/no/try]
# DEFAULT: try
# Enabling this option will make it possible to specify bind paths to locations
# that do not currently exist within the container.  If 'try' is chosen,
# overlayfs will be tried but if it is unavailable it will be silently ignored.
enable overlay = yes

and add:

bind path =  /WORKING/

When I run activate the image $ singularity shell nfcore-methylseq-1.3.img and run pwd I am in /home/mawo but I am not able from there to go to WORKING directory (cd ../../WORKING/)

Script is following:

./nextflow run methylseq-1.3/ \
  -with-singularity nfcore-methylseq-1.3.img -profile standard,singularity \
  --reads "/WORKING/projects/DNAm_testis/genome/data/FimmX_accel1S_S13_R{1,2}_*.fastq" \
  --outdir "." \
  --fasta "references/Homo_sapiens/UCSC/hg38/Sequence/" \
  --bismark_index "references/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" \
  --accel --saveTrimmed \
  --saveAlignedIntermediates --max_memory "60.GB" --max_cpu "8" --max_time "2.h"

Output is following:

mawo@int-hpc-003:/WORKING/projects/DNAm_testis/genome$ ./meth.sh
N E X T F L O W  ~  version 19.01.0
Launching `methylseq-1.3/main.nf` [gigantic_leavitt] - revision: 126c7a6dcc
WARN: Access to undefined parameter `readPaths` -- Initialise it to a default value eg. `params.readPaths = some_value`
=======================================================
                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~'
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'

nf-core/methylseq : Bisulfite-Seq Best Practice v1.3
=======================================================
Pipeline Name     : nf-core/methylseq
Pipeline Version  : 1.3
Run Name          : gigantic_leavitt
Reads             : /WORKING/projects/DNAm_testis/genome/data/FimmX_accel1S_S13_R{1,2}_*.fastq
Aligner           : bismark
Data Type         : Paired-End
Genome            : false
Bismark Index     : references/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/
Fasta Ref         : references/Homo_sapiens/UCSC/hg38/Sequence/
Trim Profile      : Accel-NGS (Swift)
Trim R1           : 10
Trim R2           : 15
Trim 3' R1        : 10
Trim 3' R2        : 10
Deduplication     : Yes
Directional Mode  : Yes
All C Contexts    : No
Save Reference    : No
Save Trimmed      : Yes
Save Unmapped     : No
Save Intermeds    : Yes
Max Memory        : 60.GB
Max CPUs          : 16
Max Time          : 2.h
Output dir        : .
Working dir       : /WORKING/projects/DNAm_testis/genome/work
Container Engine  : singularity
Container         : nfcore-methylseq-1.3.img
Current home      : /home/mawo
Current user      : mawo
Current path      : /WORKING/projects/DNAm_testis/genome
Script dir        : /WORKING/projects/DNAm_testis/genome/methylseq-1.3
Config Profile    : standard,singularity
=========================================
[warm up] executor > local
[86/a97a8b] Submitted process > fastqc (FimmX_accel1S_S13_R)
[61/9b8ee8] Submitted process > fastqc (FimmX_accel1S_S13_R)
[da/b7d954] Submitted process > trim_galore (FimmX_accel1S_S13_R)
[5a/310f2e] Submitted process > trim_galore (FimmX_accel1S_S13_R)
[02/64ce50] Submitted process > get_software_versions
ERROR ~ Error executing process > 'fastqc (FimmX_accel1S_S13_R)'

Caused by:
  Process `fastqc (FimmX_accel1S_S13_R)` terminated with an error exit status (127)

Command executed:

  fastqc -q FimmX_accel1S_S13_R1_001.fastq FimmX_accel1S_S13_R2_001.fastq

Command exit status:
  127

Command output:
  (empty)

Command error:
  /bin/bash: line 0: cd: /WORKING/projects/DNAm_testis/genome/work/86/a97a8ba2b1c1939e9c572817b00498: No such file or directory
  /bin/bash: .command.stub: No such file or directory

Work dir:
  /WORKING/projects/DNAm_testis/genome/work/86/a97a8ba2b1c1939e9c572817b00498

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

 -- Check '.nextflow.log' file for details
[nf-core/methylseq] Pipeline Complete
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.

Thanks, Bazyl

bazyliszek commented 5 years ago

Do you guys have 2 singularity.config files on HPC? I see there is one under usr/local/etc/singularity and etc/singularity which maybe is one of the issue here? Setting up Singularity on HPC is difficult even for admins, seems.