nf-core / smrnaseq

A small-RNA sequencing analysis pipeline
https://nf-co.re/smrnaseq
MIT License
74 stars 125 forks source link

The pipeline doesn't finish but is marked as completed #415

Closed karlaarz closed 2 months ago

karlaarz commented 2 months ago

Description of the bug

Hello!

I've been running the pipeline and after it's finished, I get the message that the pipeline has been completed:

-[nf-core/smrnaseq] Pipeline completed successfully-
Completed at: 06-Sep-2024 20:22:52
Duration    : 6h 40m 6s
CPU hours   : 78.7
Succeeded   : 218

However, when looking into the results and the report in more detail, I find that not all the processes are performed or concluded. For instance, mirtop is not performed at all. This is a short example:

[7b/f5a870] NFC…NASEQ:MIRNA_QUANT:EDGER_QC | 0 of 1
[41/d25fb2] NFC…ENCES (RPE_WT1_seqcluster) | 8 of 8 ✔
[0f/7310c2] NFC…R (RPE_P347S_2_seqcluster) | 8 of 8 ✔
[b8/b34be8] NFC…WT1_mature_hairpin_genome) | 8 of 8 ✔
[a7/29c635] NFC…S_1_mature_hairpin_genome) | 6 of 8
[31/afe38b] NFC…S_1_mature_hairpin_genome) | 6 of 6
[5a/04157f] NFC…P2:MIRDEEP2_PIGZ (RPE_WT2) | 8 of 8 ✔
[f9/12e082] NFC…:MIRDEEP2_MAPPER (RPE_WT4) | 8 of 8 ✔
[d8/9f42ae] NFC…:MIRDEEP2:MIRDEEP2_RUN (8) | 8 of 8 ✔
Plus 6 more processes waiting for tasks…

[41/d25fb2] NFC…ENCES (RPE_WT1_seqcluster) | 8 of 8 ✔
[0f/7310c2] NFC…R (RPE_P347S_2_seqcluster) | 8 of 8 ✔
[ef/08cc8d] NFC…WT2_mature_hairpin_genome) | 7 of 8
[6a/2d4f53] NFC…S_2_mature_hairpin_genome) | 1 of 7
[20/e5e875] NFC…S_2_mature_hairpin_genome) | 0 of 1
[5a/04157f] NFC…P2:MIRDEEP2_PIGZ (RPE_WT2) | 8 of 8 ✔

Thanks in advance

Command used and terminal output

nextflow pull nf-core/smrnaseq -r dev
nextflow run nf-core/smrnaseq -r dev \
    -profile conda \
    --input 'small_PB.csv' \
    --fasta Mus_musculus.GRCm39.dna.toplevel.fa \
    --mirtrace_species 'mmu' \
    --mirna_gtf mmu.gff3 \
    --hairpin hairpin.fa \
    --mature mature.fa \
    --protocol 'custom' \
    --three_prime_adapter auto-detect \
    --outdir res_25nt \
    --fastp_max_length 25
    -resume

Relevant files

.nextflow.log

System information

I am using the latest dev version in slurm:

nextflow pull nf-core/smrnaseq -r dev nextflow run nf-core/smrnaseq -r dev

apeltzer commented 2 months ago

Can you pull the latest dev branch again? And give it a try again?

karlaarz commented 2 months ago

Hi @apeltzer. It seems to work with singularity, not with conda. I analysed several datasets but today, I got the following error:

ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:MIRNA_QUANT:SEQCLUSTER_COLLAPSE (R62015-087pf_12-1549-117-IR_L2)'

Caused by:
  Process `NFCORE_SMRNASEQ:MIRNA_QUANT:SEQCLUSTER_COLLAPSE (R62015-087pf_12-1549-117-IR_L2)` terminated with an error exit status (2)

Command executed:

  seqcluster \
      collapse \
      -m 1 --min_size 15 \
      -f R62015-087pf_12-1549-117-IR_L2_1.fastp.fastq.gz R62015-087pf_12-1549-117-IR_L2_2.fastp.fastq.gz  \
      -o collapsed

  gzip collapsed/*_trimmed.fastq
  mv collapsed/*_trimmed.fastq.gz R62015-087pf_12-1549-117-IR_L2_seqcluster.fastq.gz

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SMRNASEQ:MIRNA_QUANT:SEQCLUSTER_COLLAPSE":
      seqcluster: $(echo $(seqcluster --version 2>&1) | sed 's/^.*seqcluster //')
  END_VERSIONS

Command exit status:
  2

Command output:
  Probably this will fail, you need bcbio-nextgen for many installation functions.
  ['collapse', '-m', '1', '--min_size', '15', '-f', 'R62015-087pf_12-1549-117-IR_L2_1.fastp.fastq.gz', 'R62015-087pf_12-1549-117-IR_L2_2.fastp.fastq.gz', '-o', 'collapsed']

Command error:
  usage: seqcluster [-h] [--version] {collapse} ...
  seqcluster: error: unrecognized arguments: R62015-087pf_12-1549-117-IR_L2_2.fastp.fastq.gz
  Probably this will fail, you need bcbio-nextgen for many installation functions.
  ['collapse', '-m', '1', '--min_size', '15', '-f', 'R62015-087pf_12-1549-117-IR_L2_1.fastp.fastq.gz', 'R62015-087pf_12-1549-117-IR_L2_2.fastp.fastq.gz', '-o', 'collapsed']

Any idea what could be causing this?

Best

apeltzer commented 2 months ago

Not really, maybe we can investigate - although I find it weird that it works with singularity but not conda :-(

atrigila commented 2 months ago

@karlaarz @apeltzer

This issue might be solved with my latest PR (not yet merged), where I migrated local to nf-core mirtop.

I tested it with conda using a similar command:

nextflow run smrnaseq \
    -profile conda \
    --input 'https://github.com/nf-core/test-datasets/raw/smrnaseq/samplesheet/v2.0/samplesheet_skipfastp.csv' \
    --protocol 'custom' \
    --three_prime_adapter auto-detect \
    --outdir res_25nt \
    --fastp_max_length 25 \
    --max_cpus 12 \
    --max_memory 16.GB \
    -resume \
    --mirtrace_species 'hsa'

I did not encounter any errors, and the pipeline successfully ran mirtop:

[13/6d2822] NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_GFF (bams)                            [100%] 1 of 1 ✔
[12/7a77db] NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_COUNTS (bams)                         [100%] 1 of 1 ✔
[ac/086725] NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_EXPORT (bams)                         [100%] 1 of 1 ✔
[bd/ee59e6] NFCORE_SMRNASEQ:MIRNA_QUANT:BAM_STATS_MIRNA_MIRTOP:MIRTOP_STATS (bams)                          [100%] 1 of 1 ✔
apeltzer commented 2 months ago

where

This issue might be solved with my latest PR (not yet merged), where I migrated local to nf-core mirtop.

Merged now 👍🏻

atrigila commented 2 months ago

@karlaarz let me know if that solves the issue or I can have another look :)

karlaarz commented 2 months ago

Hello @apeltzer and @atrigila. Thanks a lot for your help. The pipeline runs now with conda, but I'm having another error. I am working with paired-end data. I get these warnings for all my samples: WARN: Sample R62015-120pf_14-0568-162-IR_L4 is detected as paired-end reads (fastq_1 and fastq_2). The pipeline only handles SE data. Samplesheets with fastq_1 and fastq_2 are supported but fastq_2 is removed.

And the pipeline stops here:

ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:FASTQ_FASTQC_UMITOOLS_FASTP:FASTP (R62015-053pf_12-1494-115-IR_L2)'

Caused by:
  Process `NFCORE_SMRNASEQ:FASTQ_FASTQC_UMITOOLS_FASTP:FASTP (R62015-053pf_12-1494-115-IR_L2)` terminated with an error exit status (255)

Command executed:

  [ ! -f  R62015-053pf_12-1494-115-IR_L2_1.fastq.gz ] && ln -sf R62015-053pf_12-1494-115-IR_L2.R1.fastq.gz R62015-053pf_12-1494-115-IR_L2_1.fastq.gz
  [ ! -f  R62015-053pf_12-1494-115-IR_L2_2.fastq.gz ] && ln -sf null R62015-053pf_12-1494-115-IR_L2_2.fastq.gz
  fastp \
      --in1 R62015-053pf_12-1494-115-IR_L2_1.fastq.gz \
      --in2 R62015-053pf_12-1494-115-IR_L2_2.fastq.gz \
      --out1 R62015-053pf_12-1494-115-IR_L2_1.fastp.fastq.gz \
      --out2 R62015-053pf_12-1494-115-IR_L2_2.fastp.fastq.gz \
      --json R62015-053pf_12-1494-115-IR_L2.fastp.json \
      --html R62015-053pf_12-1494-115-IR_L2.fastp.html \
      --adapter_fasta known_adapters.fa \
       \
       \
      --thread 6 \
      --detect_adapter_for_pe \
      -l 17 --max_len1 25 \
      2> >(tee R62015-053pf_12-1494-115-IR_L2.fastp.log >&2)

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SMRNASEQ:FASTQ_FASTQC_UMITOOLS_FASTP:FASTP":
      fastp: $(fastp --version 2>&1 | sed -e "s/fastp //g")
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Command error:
  ERROR: Failed to open file: R62015-053pf_12-1494-115-IR_L2_2.fastq.gz

The code that I am using is still the same.

atrigila commented 2 months ago

Thank you for your feedback! I will revise this and get back to you soon.

atrigila commented 2 months ago

Hi! Just a quick update. I have detected the issue here, we need to update the sample meta as it is still single_end:false. We will address this shortly and let you know.

CC @nschcolnicov

nschcolnicov commented 2 months ago

PR for this issue was merged, @karlaarz please pull latest dev version, and let me know if it works for you!

apeltzer commented 2 months ago

And re-open if this is not the case please :)

karlaarz commented 1 month ago

Hello! Thanks for all your help through this time. I still get the same warnings, but now I get another error:

ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:MIRDEEP2:MIRDEEP2_MAPPER (1)'

Caused by:
  Not a valid path value type: java.util.LinkedHashMap ([id:Homo_sapiens.GRCh38.dna.toplevel])

The path to the file is correct. Please let me know if you need extra files or file/info.

apeltzer commented 1 month ago

We have an open PR for mirdeep2, in case you really want to do novel mirna prediction. Please open a separate issue for this and test once the PR #448 has been merged to dev :-)