polio-nanopore / piranha

GNU General Public License v3.0
16 stars 4 forks source link

Running pipeline on custom reference (whole genome) #122

Open ammaraziz opened 1 year ago

ammaraziz commented 1 year ago

I am trying to run the pipeline with a custom reference (whole genome). I am unsure what parameters to specify so the pipeline successfully runs.

There is a --analysis-mode flag, which accepts (from reading the code) vp1 and wg_2tile. I've tried using both and have received errors for both.

For running with the default vp1 I receive this error:

RuleException:
CalledProcessError in file /home/aaziz/miniconda3/envs/piranha/lib/python3.9/site-packages/piranha/scripts/piranha_vp1.smk, line 73:
Command 'set -euo pipefail;  snakemake --nolock --snakefile /home/aaziz/miniconda3/envs/piranha/lib/python3.9/site-packages/piranha/scripts/variation.smk --forceall --rerun-incomplete --quiet --log-handler-script /home/aaziz/miniconda3/envs/piranha/lib/python3.9/site-packages/piranha/utils/log_handler_handle.py  --configfile /tmp/tmpedrylpu8/preprocessing_config.yaml --config barcode=barcode25 outdir=/home/aaziz/projects/entero/gridion_run1/analysis/_3/barcode25 tempdir=/tmp/tmpedrylpu8/barcode25 sample='EVGS0077' --cores 150 &> /tmp/tmpedrylpu8/logs/barcode25_variation.smk.log' returned non-zero exit status 1.
  File "/home/aaziz/miniconda3/envs/piranha/lib/python3.9/site-packages/piranha/scripts/piranha_vp1.smk", line 73, in __rule_generate_variation_info
  File "/home/aaziz/miniconda3/envs/piranha/lib/python3.9/concurrent/futures/thread.py", line 58, in run
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-02-28T171601.634750.snakemake.log

When digging through the logs, it seems to have encountered too many values (expects 2 but receives more). I am rerunning the pipeline with specifying temp directory to better understand the error.

For analysis mode wg2_tile, I receive this error:

Select jobs to execute...
Complete log: .snakemake/log/2023-03-06T094748.900415.snakemake.log
Building DAG of jobs...
MissingInputException in rule all in file /home/aaziz/miniconda3/envs/piranha/lib/python3.9/site-packages/piranha/scripts/piranha_wg_2tile.smk, line 13:
Missing input files for rule all:
    affected files:
        /home/aaziz/projects/entero/gridion_run1/analysis_wgtile/barcode_reports/barcode05_report.html

I guess the ultimate question is, does the pipeline support other amplicon schemes? Currently it spans all of the capsid region and partly into non-structural genes (2A).

Thank you for all your help.

Ammar

ammaraziz commented 1 year ago

When digging through the logs, it seems to have encountered too many values (expects 2 but receives more). I am rerunning the pipeline with specifying temp directory to better understand the error.

Regarding this error, this was fixed in version 1.0.8 ( I was running 1.0.6). Closing as resolved.

I'll rerun the with version 1.0.8 testing a custom reference.

aineniamh commented 1 year ago

Hi @ammaraziz, I have this on my to-do list to make it easier for people to run whole genome options. I'd say the best thing to do currently would be to just run in regular mode, but with the options configured for whole genome. There's some extra qc I'd like to add in (like masking if coverage is too low etc for a given amplicon) on whole genome, but up till recently my priority fir piranha was testing out the vp1 protocol options!

ammaraziz commented 1 year ago

What options do you recommend setting for whole genome? I have set the min/max read length, min read depth. Any other options you recommend?

Also, happy to confirm that updating to the latest version solved all my issues and that running a whole genome sample works relatively well!

Great pipeline.