Closed emilio-r closed 3 years ago
Did you put #!/usr/bin/env python
in the first line of the script? Also chmod -x sendsketch_stainer.py
to make it executable.
Thanks for replying! Yes, the script has a shebang for python3 (#!/usr/bin/env python3) and should also be executable by all (-rwxr-xr-x).
@emilio-r , could you please share some info about the Nextflow version, nextflow info
output should be enough.
This is the output from nextflow info:
Version: 20.10.0 build 5430
Created: 01-11-2020 15:14 UTC (16:14 CEST)
System: Linux 3.10.0-1127.19.1.el7.x86_64
Runtime: Groovy 3.0.5 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12
Encoding: UTF-8 (UTF-8)
Hi @emilio-r , I'm trying to run it locally and debug it.
I created a basic test (unrelated to Bactpipe) and the concept works fine but I wasn't able to run the pipeline with the intended dataset - could you please provide me with the SRA IDs of a couple genomes?
You can mail me at abhi18av_at_outlook _dot_com.
Hello @abhi18av, and thanks for helping! Unfortunately the pipe is also experiencing an unrelated issue regarding running it locally (Not via our cluster) which might explain why you were unable to run it. However, if you are still insterested in testing, we can recommend doing so with SRS679604 and/or SRS679605.
Hi @emilio-r , so I've tried to run it with SRA IDs you shared but unfortunately it fails for me in the first process
(bactpipe) Abhinavs-MacBook-Pro:BACTpipe eklavya$ nextflow run bactpipe.nf --mashscreen_database ./refseq.genomes.k21s1000.msh --reads './*_{1,2}.fastq.gz'
N E X T F L O W ~ version 20.10.0
Launching `bactpipe.nf` [friendly_swartz] - revision: 46edd98913
============================================================
BACTpipe
Version 2.7.0
Bacterial whole genome analysis pipeline
https://bactpipe.readthedocs.io
============================================================
Running with the following settings:
mashscreen_database: ./refseq.genomes.k21s1000.msh
reads: ./*_{1,2}.fastq.gz
profiles_that_require_project: [rackham]
project:
clusterOptions: false
output_dir: BACTpipe_results
ignore_contamination_screen: false
bbduk_adapters: adapters
bbduk_minlen: 30
bbduk_qtrim: rl
bbduk_trimq: 10
bbduk_ktrim: r
bbduk_k: 30
bbduk_mink: 11
bbduk_hdist: 1
bbduk_trimbyoverlap: trimbyoverlap
bbduk_trimpairsevenly: trimpairsevenly
shovill_depth: 100
shovill_kmers: 31,33,55,77,99,127
shovill_minlen: 500
prokka_evalue: 1e-09
prokka_kingdom: Bacteria
prokka_reference:
prokka_gram_stain:
============================================================
executor > local (2)
[95/bdb463] process > screen_for_contaminants (SRR1544630) [100%] 2 of 2, failed: 2 ✔
[- ] process > concatenate_mash_screen_results -
[- ] process > bbduk -
[- ] process > fastqc -
[- ] process > shovill -
[- ] process > prokka -
[- ] process > multiqc -
============================================================
BACTpipe workflow completed without errors
Check output files in folder:
BACTpipe_results
============================================================
[88/d4cbac] NOTE: Process `screen_for_contaminants (SRR1544631)` failed -- Error is ignored
[95/bdb463] NOTE: Process `screen_for_contaminants (SRR1544630)` failed -- Error is ignored
Completed at: 26-Nov-2020 12:35:12
Duration : 20m 20s
CPU hours : 0.7 (100% failed)
Succeeded : 0
Ignored : 2
Failed : 2
And here's the output of .command.log
for the process
nxf-scratch-dir Abhinavs-MacBook-Pro.local:/var/folders/zp/63677vtx23d_b2_nd7mm92040000gn/T//nxf.yEoDZmSlDd
Loading refseq.genomes.k21s1000.msh...
16160796 distinct hashes.
Streaming from 2 inputs...
Also, I noticed that the pipeline is somewhat dated, I'd be happy to connect help fix these bugs and modernize the pipeline.
Hello again @abhi18av. We would all be very grateful for any and all help you can provide in regards to making the pipe work. From what I can see you are indeed running an outdated version of the pipe as the current development branch/version is the "BACTpipe-3.E", and not the Master. What's more, a slightly updated version of this development branch has has now been migrated to another repo and can be found as branch "3.E" here: https://github.com/ctmrbio/BACTpipe/tree/3.E Thank you for your assistance!
Ahh, I see!
I ran the 3.E
branch and the entire process completed nicely.
(bactpipe) Abhinavs-MacBook-Pro:BACTpipe eklavya$ nextflow run bactpipe.nf --mashscreen_database ./refseq.genomes.k21s1000.msh --reads './*_{1,2}.fastq.gz'
N E X T F L O W ~ version 20.10.0
Launching `bactpipe.nf` [evil_curie] - revision: 0b785cf0dd
============================================================
BACTpipe
Version 3.E
Bacterial whole genome analysis pipeline
https://bactpipe.readthedocs.io
============================================================
Running with the following settings:
mashscreen_database: ./refseq.genomes.k21s1000.msh
reads: ./*_{1,2}.fastq.gz
profiles_that_require_project: [rackham]
project:
clusterOptions: false
output_dir: BACTpipe_results
shovill_depth: 100
shovill_kmers: 31,33,55,77,99,127
shovill_minlen: 500
prokka_evalue: 1e-09
prokka_kingdom: Bacteria
prokka_reference:
============================================================
[ab/879d6f] process > fastp (SRR1544631) [100%] 2 of 2 ✔
[94/1f04cd] process > shovill (SRR1544631) [100%] 2 of 2 ✔
[6b/db224a] process > screen_for_contaminants (SRR1544631) [100%] 2 of 2 ✔
[0a/96e06c] process > prokka (SRR1544631) [100%] 2 of 2 ✔
[64/47efa8] process > assembly_stats (SRR1544631) [100%] 2 of 2 ✔
[0a/73ae03] process > multiqc [100%] 1 of 1 ✔
============================================================
BACTpipe workflow completed without errors
Check output files in folder:
BACTpipe_results
============================================================
Completed at: 26-Nov-2020 15:28:58
Duration : 33m 11s
CPU hours : 0.9
Succeeded : 11
Here are the contents for the results dir
(bactpipe) Abhinavs-MacBook-Pro:BACTpipe eklavya$ tree BACTpipe_results/ -L 2
BACTpipe_results/
├── fastp
│ ├── SRR1544630.json
│ ├── SRR1544630_1.fastp.fq.gz
│ ├── SRR1544630_2.fastp.fq.gz
│ ├── SRR1544631.json
│ ├── SRR1544631_1.fastp.fq.gz
│ └── SRR1544631_2.fastp.fq.gz
├── multiqc
│ └── multiqc_report.html
├── prokka
│ ├── SRR1544630_prokka
│ └── SRR1544631_prokka
├── sendsketch
│ ├── SRR1544630.sendsketch.txt
│ └── SRR1544631.sendsketch.txt
└── shovill
├── SRR1544630.assembly_stats.txt
├── SRR1544630.contigs.fa
├── SRR1544630_shovill
├── SRR1544631.assembly_stats.txt
├── SRR1544631.contigs.fa
└── SRR1544631_shovill
Here are the contents of the workDir
for screen_for_contaminants
(bactpipe) Abhinavs-MacBook-Pro:BACTpipe eklavya$ ls work/6b/db224a551017c44347383ae58cf0f6/
total 40
drwxr-xr-x 11 eklavya staff 352 Nov 26 15:19 .
drwxr-xr-x 3 eklavya staff 96 Nov 26 15:19 ..
-rw-r--r-- 1 eklavya staff 0 Nov 26 15:19 .command.begin
-rw-r--r-- 1 eklavya staff 665 Nov 26 15:19 .command.err
-rw-r--r-- 1 eklavya staff 777 Nov 26 15:19 .command.log
-rw-r--r-- 1 eklavya staff 4 Nov 26 15:19 .command.out
-rw-r--r-- 1 eklavya staff 10326 Nov 26 15:19 .command.run
-rw-r--r-- 1 eklavya staff 267 Nov 26 15:19 .command.sh
-rw-r--r-- 1 eklavya staff 91 Nov 26 15:19 .command.trace
-rw-r--r-- 1 eklavya staff 1 Nov 26 15:19 .exitcode
-rw-r--r-- 1 eklavya staff 2033 Nov 26 15:19 SRR1544631.sendsketch.txt
NOTE: I had already created a bactpipe
conda env with the requirements mentioned in the docs https://bactpipe.readthedocs.io/en/latest/installation.html#installing-dependencies-into-the-conda-base-environment
Hi @abhi18av. Of some strange reason this issue seems to have resolved itself when running it using the local configuration without us changing anything really... The readthedocs.io installation info is deprecated (we have changed some of the software in the pipeline) but I assume that since the local.config file creates conda environments for each process one should not need to have anything installed (except for conda and nextflow) beforehand, right?
We are now checking if the "file not found" issue still remains when running it in our HPC environment.
Next, if you are still willing to make the pipeline more up to date with the current Nextflow standards, it would be very appreciated 👍
Hi @thorellk
Definitely, you can expect a PR this weekend :)
Wow, great, thank you 😃
Issue resolved and is thought to have arisen due to differences in how conda installed python, and how the modules of the HPC installed python. Changing so that this process installed python using conda, instead of using it as a module, fixed the issue. Thanks everyone for your assistance!
Hello, I am trying currently trying to add to a Nextflow bacterial bioinformatics pipeline in which I want to call a Pythonscript that I have placed in the bin folder. However when running the process with this script called the rest of the process fails.
When running the pipe i receive this error: NOTE: Process
screen_for_contaminants (192_S54_L001)
terminated with an error exit status (127) -- Error is ignored and .command.log: .command.sh: line 4: sendsketch_stainer.py: command not foundThe process that I am having issues with is this: process screen_for_contaminants { tag { pair_id } publishDir "${params.output_dir}/sendsketch", mode: 'copy'
}
My repo can be found here: https://github.com/ctmrbio/BACTpipe/tree/3.E
All help is greatly appreciated!