polio-nanopore / piranha

GNU General Public License v3.0
15 stars 4 forks source link

Error running wg analysis mode: cannot find WG snakefile (piranha_wg.smk) #239

Open eduardoscopel opened 3 days ago

eduardoscopel commented 3 days ago

General info I've been trying to run piranha on linux (inside an HPC) but am unable to run it on wg analysis mode. When I run on VP1 mode (default) it runs just fine, but whenever I try the wg mode I get an error saying the snakefile piranha_wg.smk couldn't be found (more details below).

I'm running piranha version 1.2.5. I've tried running it from a conda environment and a singularity container. Given the permissions set by the HPC I'm working on, I'm unable to use Docker, and when I try installing it from source, I get several incompatibility errors (same as reported on issue #223).

Running from a Conda environment The conda environment was built with the following commands:

mamba create --name piranha-polio
conda activate piranha-polio
mamba install piranha-polio -c bioconda -c conda-forge

After successfully installing piranha, I run the following commands to activate the environment and run the pipeline:

conda activate piranha-polio
piranha -i ~/xwe3/workdir/PolioONTbenchmark/data/JamesSamples/forPiranha/fastq_pass/ -b JSsamplesheet.csv  -m wg -o JamesSamplesVP1_62724_wg --no-temp

Instantly, I get the following error:

Error: cannot find Snakefile at /scicomp/home-pure/xwe3/.conda/envs/piranha-polio/lib/python3.8/site-packages/piranha/scripts/piranha_wg.smk
Check installation or specify another analysis mode

When I check the folder /scicomp/home-pure/xwe3/.conda/envs/piranha-polio/lib/python3.8/site-packages/piranha/scripts/ the snakefile piranha_wg.smk is not there:

~/xwe3/workdir/PolioONTbenchmark/piranha/JamesSamples$ ll /scicomp/home-pure/xwe3/.conda/envs/piranha-polio/lib/python3.8/site-packages/piranha/scripts/
total 39
-rw-------. 2 xwe3 users    0 Jun 19 12:21 __init__.py
drwx--S---. 2 xwe3 users    0 Jun 20 13:18 __pycache__
-rw-------. 2 xwe3 users 7206 Jun 19 12:21 piranha_consensus.smk
-rw-------. 2 xwe3 users 6611 Jun 19 12:21 piranha_curate.smk
-rw-------. 2 xwe3 users 6802 Jun 19 12:21 piranha_haplotype.smk
-rw-------. 2 xwe3 users 2488 Jun 19 12:21 piranha_phylo.smk
-rw-------. 2 xwe3 users 4575 Jun 19 12:21 piranha_preprocessing.smk
-rw-------. 2 xwe3 users 2504 Jun 19 12:21 piranha_variation.smk
-rw-------. 2 xwe3 users 8509 Jun 19 12:21 piranha_vp1.smk

I've tried downloading the piranha_wg.smk file from the github repo into this folder (not ideal, but I thought it was worth trying before opening an issue) and run the aforementioned command again, and I get the following error after the analysis runs for a little bit:

Error in rule gather_consensus_sequences:
    jobid: 0
    input: /scicomp/scratch/xwe3/workdir/PolioONTbenchmark/piranha/JamesSamples/JamesSamplesVP1_62724_wg/sample_composition.csv
    output: /scicomp/scratch/xwe3/workdir/PolioONTbenchmark/piranha/JamesSamples/JamesSamplesVP1_62724_wg/published_data/vp1_sequences.fasta

RuleException:
TypeError in file /scicomp/home-pure/xwe3/.conda/envs/piranha-polio/lib/python3.8/site-packages/piranha/scripts/piranha_wg.smk, line 98:
gather_fasta_files() missing 2 required positional arguments: 'publish_dir' and 'config'
  File "/scicomp/home-pure/xwe3/.conda/envs/piranha-polio/lib/python3.8/site-packages/piranha/scripts/piranha_wg.smk", line 98, in __rule_gather_consensus_sequences
  File "/scicomp/home-pure/xwe3/.conda/envs/piranha-polio/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-06-28T092052.455429.snakemake.log

The complete log file is not very informative (see below):

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job                           count
--------------------------  -------
all                               1
gather_consensus_sequences        1
total                             2

Select jobs to execute...
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-06-28T092052.455429.snakemake.log

I thought this could be a problem with the conda environment I created, so I tried running it from a singularity container (see below).

Running from Singularity container To run the pipeline on a singularity container, I downloaded piranha's image from Galaxy and ran it with the following command:

singularity exec ~/singularityIMG/piranha-polio%3A1.2.5--pyhdfd78af_0 piranha -i ~/xwe3/workdir/PolioONTbenchmark/data/JamesSamples/forPiranha/fastq_pass/ -b JSsamplesheet.csv -m wg -o JamesSamplesVP1_62724_wg --no-temp

Almost immediately, I get the same error as with the conda environment:

Error: cannot find Snakefile at /usr/local/lib/python3.9/site-packages/piranha/scripts/piranha_wg.smk
 Check installation or specify another analysis mode

When I check the content of that folder, the piranha_wg.smk is not there. Next, I tried going into the singularity container with singularity shell ~/singularityIMG/piranha-polio%3A1.2.5--pyhdfd78af_0, and when I check the contents of the container the following options show up (still no piranha_wg.smk):

Singularity> piranha
piranha                    piranha_curate.smk         piranha_phylo.smk          piranha_variation.smk
piranha_consensus.smk      piranha_haplotype.smk      piranha_preprocessing.smk  piranha_vp1.smk

Concluding remarks I've had the HPC support team to replicate the same errors and I was able to replicate the errors on my local computer, suggesting this is not an issue with the environments I'm using. I've also tried the aforementioned on a different version of piranha (1.2.2) with the same results. Is it possible the piranha_wg.smk snakefile is not included in the singularity container available on Galaxy and in the conda install files? Again, I can run the analysis on VP1 mode just fine both on singularity and conda.

Any help would be greatly appreciated! Thanks, Eduardo