nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
140 stars 80 forks source link

endorspy erroring in nf-core/ender workflow #1083

Closed VerbalCant closed 3 weeks ago

VerbalCant commented 3 weeks ago

Check Documentation

I have checked the following places for your error:

Description of the bug

endors.py step is failing in a way that seems to be related to program arguments. The nf-core/eager version in bin/endors.py, which is what is being called, is quite a bit different than the current release version of [endors.py(https://github.com/aidaanva/endorS.py), but I don't think that matters. If I can make a guess of what's happening based on the extra space between the -n and the _flagstat_stats below, perhaps it's passing an empty value to --name/-n ?. If I look at the code here, it looks like might have an empty/null library_id?

I do also note the earlier warnings about the _rmdup files, where they're specifying the destination off of root, /_rmdup.bam, instead of what I presume should be more like ./_rmdup.bam? I don't have any evidence that they're related.

Steps to reproduce

Steps to reproduce the behaviour:

  1. Command line: NXF_VER=22.10.6 nextflow run nf-core/eager -profile docker --input '*Momia3_S2_L001_R1_001_merged.fastq.gz' --fasta /references/reference_genomes/hg38.analysisSet.fa --max_memory 60GB --max_cpus 16 --max_time 288.h -config nextflow_custom.config --run_genotyping true --save_reference --genotyping_tool 'freebayes' --run_mtnucratio true --run_sexdeterrmine true --run_nuclear_contamination true --single_end true --mapper bowtie2 -resume

It fails with this error the first time it hits endors.py, which (I can confirm) is the one called from the version in nf-core/eager/bin. It fails on subsequent runs with -resume, and if I clean the directory, remove all subdirectories, and start again from scratch.

  1. See error:
    
    -[nf-core/eager] Pipeline completed with errors-
    WARN: Failed to publish file: /working/momia3_2024-08-19/work/7d/667071f6d11007a3023ad1269d2c06/_rmdup.bam; to: /_rmdup.bam [copy] -- See log file for details
    WARN: Failed to publish file: /working/momia3_2024-08-19/work/7d/667071f6d11007a3023ad1269d2c06/_rmdup.metrics; to: /_rmdup.metrics [copy] -- See log file for details
    WARN: Failed to publish file: /working/momia3_2024-08-19/work/7d/667071f6d11007a3023ad1269d2c06/_rmdup.bam.bai; to: /_rmdup.bam.bai [copy] -- See log file for details
    WARN: Graphviz is required to render the execution DAG in the given format -- See http://www.graphviz.org for more info.
    Error executing process > 'endorSpy ()'

Caused by: Process endorSpy () terminated with an error exit status (2)

Command executed:

endorS.py -o json -n _flagstat.stats

Command exit status: 2

Command output: (empty)

Command error: usage: python endorS.py [-h] [--version] [-o json,none] [-n] .stats [.stats] endorS.py: error: the following arguments are required: .stats

Work dir: /working/momia3_2024-08-19/work/12/1bfe7d3c613136f7be70fffcd8377d

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line


## Expected behaviour

I expected it to proceed through the endorS.py step and move on to the next step of the workflow. 

## Log files

Have you provided the following extra information/files:

- [x] The command used to run the pipeline
- [x] The `.nextflow.log` file: [nextflow.log](https://github.com/user-attachments/files/16665455/nextflow.log)

- [x] The exact error: 

Command error: usage: python endorS.py [-h] [--version] [-o json,none] [-n] .stats [.stats] endorS.py: error: the following arguments are required: .stats


## System

- Hardware: 16-core i7 desktop, 64GB
- Executor: local
- OS: Ubuntu 
- Version 22.04 LTS

## Nextflow Installation

- Version: 22.10.6

## Container engine

- Engine: docker
- version: Docker version 27.0.3, build 7d4bcd8
- Image tag: <!-- [e.g. nfcore/eager:2.5.2] -->

## Additional context

The reads are a PE aDNA run that was merged using ([NGmerge](https://github.com/jsh58/NGmerge)) into a single fastq.

I can confirm that the working directory contains the symlink to the _flagstat.stats:

╭─    /working/momia3_2024-08-19/work/12/1bfe7d3c613136f7be70fffcd8377d ╰─ ls -l total 8 lrwxrwxrwx 1 a a 81 Aug 19 17:05 _flagstat.stats -> /working/momia3_2024-08-19/work/3c/cc266703cd45f183eb1870a02677db/_flagstat.stats lrwxrwxrwx 1 a a 69 Aug 19 17:05 nf-core_eager_dummy.txt -> /home/a/.nextflow/assets/nf-core/eager/assets/nf-core_eager_dummy.txt



The `.nextflow.log` file: [nextflow.log](https://github.com/user-attachments/files/16665455/nextflow.log)
jfy133 commented 3 weeks ago

Pinging @aidaanva

aidaanva commented 3 weeks ago

Hi @VerbalCant

Looking into the nextflow.log that you provided it seems that nf-core/eager does not parse correctly your input files into a tsv file. The names get shorten to only "_L0 "for the initial steps (fastqc, adapter removal) which then when trying to generate the bam files, the file name is further shorten to "". That's why you are seeing the error:

WARN: Failed to publish file: /working/momia3_2024-08-19/work/7d/667071f6d11007a3023ad1269d2c06/_rmdup.bam; to: /_rmdup.bam [copy] -- See log file for details
WARN: Failed to publish file: /working/momia3_2024-08-19/work/7d/667071f6d11007a3023ad1269d2c06/_rmdup.metrics; to: /_rmdup.metrics [copy] -- See log file for details
WARN: Failed to publish file: /working/momia3_2024-08-19/work/7d/667071f6d11007a3023ad1269d2c06/_rmdup.bam.bai; to: /_rmdup.bam.bai [copy] -- See log file for details

Since no files are produced, endorspy can not be run and I think this is why the workflow showed you this error.

My recommendation is that you give your input as a tsv since this is a safer way and ensures that the pipeline extracts the correct sample and library names. You can find documentation as of how set up the tsv for nf-core/eager here: https://nf-co.re/eager/2.5.2/docs/usage/#tsv-input-method

Let me know whether using the TSV input method solves the issue and if not I will take another look.

VerbalCant commented 3 weeks ago

Hey @aidaanva thanks for the quick reply! I can confirm that using the tsv input method, as described in the docs you linked, resolves this. Closing the issue, and hoping it helps somebody else in the future!