nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.69k stars 621 forks source link

Nextflow will not find script placed in bin #1796

Closed emilio-r closed 3 years ago

emilio-r commented 3 years ago

Hello, I am trying currently trying to add to a Nextflow bacterial bioinformatics pipeline in which I want to call a Pythonscript that I have placed in the bin folder. However when running the process with this script called the rest of the process fails.

When running the pipe i receive this error: NOTE: Process screen_for_contaminants (192_S54_L001) terminated with an error exit status (127) -- Error is ignored and .command.log: .command.sh: line 4: sendsketch_stainer.py: command not found

The process that I am having issues with is this: process screen_for_contaminants { tag { pair_id } publishDir "${params.output_dir}/sendsketch", mode: 'copy'

input:
tuple pair_id, file("${pair_id}.contigs.fa") from sendsketch_input

output:
file("${pair_id}.sendsketch.txt")
stdout into gramstain_result

script:
"""
sendsketch.sh \
    in=${pair_id}.contigs.fa \
    samplerate=0.1 \
    out=${pair_id}.sendsketch.txt

sendsketch_stainer.py \
    ${pair_id}.sendsketch.txt \
    "$projectDir/resources/gram_stain.txt"
"""

}

My repo can be found here: https://github.com/ctmrbio/BACTpipe/tree/3.E

All help is greatly appreciated!

phiweger commented 3 years ago

Did you put #!/usr/bin/env python in the first line of the script? Also chmod -x sendsketch_stainer.py to make it executable.

emilio-r commented 3 years ago

Thanks for replying! Yes, the script has a shebang for python3 (#!/usr/bin/env python3) and should also be executable by all (-rwxr-xr-x).

abhi18av commented 3 years ago

@emilio-r , could you please share some info about the Nextflow version, nextflow info output should be enough.

emilio-r commented 3 years ago

This is the output from nextflow info: Version: 20.10.0 build 5430
Created: 01-11-2020 15:14 UTC (16:14 CEST) System: Linux 3.10.0-1127.19.1.el7.x86_64 Runtime: Groovy 3.0.5 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 Encoding: UTF-8 (UTF-8)

abhi18av commented 3 years ago

Hi @emilio-r , I'm trying to run it locally and debug it.

I created a basic test (unrelated to Bactpipe) and the concept works fine but I wasn't able to run the pipeline with the intended dataset - could you please provide me with the SRA IDs of a couple genomes?

You can mail me at abhi18av_at_outlook _dot_com.

emilio-r commented 3 years ago

Hello @abhi18av, and thanks for helping! Unfortunately the pipe is also experiencing an unrelated issue regarding running it locally (Not via our cluster) which might explain why you were unable to run it. However, if you are still insterested in testing, we can recommend doing so with SRS679604 and/or SRS679605.

abhi18av commented 3 years ago

Hi @emilio-r , so I've tried to run it with SRA IDs you shared but unfortunately it fails for me in the first process

(bactpipe) Abhinavs-MacBook-Pro:BACTpipe eklavya$ nextflow run bactpipe.nf --mashscreen_database ./refseq.genomes.k21s1000.msh --reads './*_{1,2}.fastq.gz'
N E X T F L O W  ~  version 20.10.0
Launching `bactpipe.nf` [friendly_swartz] - revision: 46edd98913
============================================================
BACTpipe
Version 2.7.0
Bacterial whole genome analysis pipeline
https://bactpipe.readthedocs.io
============================================================
Running with the following settings:
mashscreen_database: ./refseq.genomes.k21s1000.msh
reads: ./*_{1,2}.fastq.gz
profiles_that_require_project: [rackham]
project:
clusterOptions: false
output_dir: BACTpipe_results
ignore_contamination_screen: false
bbduk_adapters: adapters
bbduk_minlen: 30
bbduk_qtrim: rl
bbduk_trimq: 10
bbduk_ktrim: r
bbduk_k: 30
bbduk_mink: 11
bbduk_hdist: 1
bbduk_trimbyoverlap: trimbyoverlap
bbduk_trimpairsevenly: trimpairsevenly
shovill_depth: 100
shovill_kmers: 31,33,55,77,99,127
shovill_minlen: 500
prokka_evalue: 1e-09
prokka_kingdom: Bacteria
prokka_reference:
prokka_gram_stain:
============================================================
executor >  local (2)
[95/bdb463] process > screen_for_contaminants (SRR1544630) [100%] 2 of 2, failed: 2 ✔
[-        ] process > concatenate_mash_screen_results      -
[-        ] process > bbduk                                -
[-        ] process > fastqc                               -
[-        ] process > shovill                              -
[-        ] process > prokka                               -
[-        ] process > multiqc                              -
============================================================
BACTpipe workflow completed without errors
Check output files in folder:
BACTpipe_results
============================================================
[88/d4cbac] NOTE: Process `screen_for_contaminants (SRR1544631)` failed -- Error is ignored
[95/bdb463] NOTE: Process `screen_for_contaminants (SRR1544630)` failed -- Error is ignored
Completed at: 26-Nov-2020 12:35:12
Duration    : 20m 20s
CPU hours   : 0.7 (100% failed)
Succeeded   : 0
Ignored     : 2
Failed      : 2

And here's the output of .command.log for the process

nxf-scratch-dir Abhinavs-MacBook-Pro.local:/var/folders/zp/63677vtx23d_b2_nd7mm92040000gn/T//nxf.yEoDZmSlDd
Loading refseq.genomes.k21s1000.msh...
   16160796 distinct hashes.
Streaming from 2 inputs...
abhi18av commented 3 years ago

Also, I noticed that the pipeline is somewhat dated, I'd be happy to connect help fix these bugs and modernize the pipeline.

emilio-r commented 3 years ago

Hello again @abhi18av. We would all be very grateful for any and all help you can provide in regards to making the pipe work. From what I can see you are indeed running an outdated version of the pipe as the current development branch/version is the "BACTpipe-3.E", and not the Master. What's more, a slightly updated version of this development branch has has now been migrated to another repo and can be found as branch "3.E" here: https://github.com/ctmrbio/BACTpipe/tree/3.E Thank you for your assistance!

abhi18av commented 3 years ago

Ahh, I see!

I ran the 3.E branch and the entire process completed nicely.

(bactpipe) Abhinavs-MacBook-Pro:BACTpipe eklavya$ nextflow run bactpipe.nf --mashscreen_database ./refseq.genomes.k21s1000.msh --reads './*_{1,2}.fastq.gz'
N E X T F L O W  ~  version 20.10.0
Launching `bactpipe.nf` [evil_curie] - revision: 0b785cf0dd
============================================================
BACTpipe
Version 3.E
Bacterial whole genome analysis pipeline
https://bactpipe.readthedocs.io
============================================================
Running with the following settings:
mashscreen_database: ./refseq.genomes.k21s1000.msh
reads: ./*_{1,2}.fastq.gz
profiles_that_require_project: [rackham]
project:
clusterOptions: false
output_dir: BACTpipe_results
shovill_depth: 100
shovill_kmers: 31,33,55,77,99,127
shovill_minlen: 500
prokka_evalue: 1e-09
prokka_kingdom: Bacteria
prokka_reference:
============================================================
[ab/879d6f] process > fastp (SRR1544631)                   [100%] 2 of 2 ✔
[94/1f04cd] process > shovill (SRR1544631)                 [100%] 2 of 2 ✔
[6b/db224a] process > screen_for_contaminants (SRR1544631) [100%] 2 of 2 ✔
[0a/96e06c] process > prokka (SRR1544631)                  [100%] 2 of 2 ✔
[64/47efa8] process > assembly_stats (SRR1544631)          [100%] 2 of 2 ✔
[0a/73ae03] process > multiqc                              [100%] 1 of 1 ✔
============================================================
BACTpipe workflow completed without errors
Check output files in folder:
BACTpipe_results
============================================================

Completed at: 26-Nov-2020 15:28:58
Duration    : 33m 11s
CPU hours   : 0.9
Succeeded   : 11

Here are the contents for the results dir

(bactpipe) Abhinavs-MacBook-Pro:BACTpipe eklavya$ tree BACTpipe_results/ -L 2
BACTpipe_results/
├── fastp
│   ├── SRR1544630.json
│   ├── SRR1544630_1.fastp.fq.gz
│   ├── SRR1544630_2.fastp.fq.gz
│   ├── SRR1544631.json
│   ├── SRR1544631_1.fastp.fq.gz
│   └── SRR1544631_2.fastp.fq.gz
├── multiqc
│   └── multiqc_report.html
├── prokka
│   ├── SRR1544630_prokka
│   └── SRR1544631_prokka
├── sendsketch
│   ├── SRR1544630.sendsketch.txt
│   └── SRR1544631.sendsketch.txt
└── shovill
    ├── SRR1544630.assembly_stats.txt
    ├── SRR1544630.contigs.fa
    ├── SRR1544630_shovill
    ├── SRR1544631.assembly_stats.txt
    ├── SRR1544631.contigs.fa
    └── SRR1544631_shovill

Here are the contents of the workDir for screen_for_contaminants

(bactpipe) Abhinavs-MacBook-Pro:BACTpipe eklavya$ ls work/6b/db224a551017c44347383ae58cf0f6/
total 40
drwxr-xr-x 11 eklavya staff   352 Nov 26 15:19 .
drwxr-xr-x  3 eklavya staff    96 Nov 26 15:19 ..
-rw-r--r--  1 eklavya staff     0 Nov 26 15:19 .command.begin
-rw-r--r--  1 eklavya staff   665 Nov 26 15:19 .command.err
-rw-r--r--  1 eklavya staff   777 Nov 26 15:19 .command.log
-rw-r--r--  1 eklavya staff     4 Nov 26 15:19 .command.out
-rw-r--r--  1 eklavya staff 10326 Nov 26 15:19 .command.run
-rw-r--r--  1 eklavya staff   267 Nov 26 15:19 .command.sh
-rw-r--r--  1 eklavya staff    91 Nov 26 15:19 .command.trace
-rw-r--r--  1 eklavya staff     1 Nov 26 15:19 .exitcode
-rw-r--r--  1 eklavya staff  2033 Nov 26 15:19 SRR1544631.sendsketch.txt

NOTE: I had already created a bactpipe conda env with the requirements mentioned in the docs https://bactpipe.readthedocs.io/en/latest/installation.html#installing-dependencies-into-the-conda-base-environment

thorellk commented 3 years ago

Hi @abhi18av. Of some strange reason this issue seems to have resolved itself when running it using the local configuration without us changing anything really... The readthedocs.io installation info is deprecated (we have changed some of the software in the pipeline) but I assume that since the local.config file creates conda environments for each process one should not need to have anything installed (except for conda and nextflow) beforehand, right?

We are now checking if the "file not found" issue still remains when running it in our HPC environment.

Next, if you are still willing to make the pipeline more up to date with the current Nextflow standards, it would be very appreciated 👍

abhi18av commented 3 years ago

Hi @thorellk

Definitely, you can expect a PR this weekend :)

thorellk commented 3 years ago

Wow, great, thank you 😃

emilio-r commented 3 years ago

Issue resolved and is thought to have arisen due to differences in how conda installed python, and how the modules of the HPC installed python. Changing so that this process installed python using conda, instead of using it as a module, fixed the issue. Thanks everyone for your assistance!