phac-nml / snvphyl-galaxy-cli

A command line interface for the SNVPhyl Galaxy pipeline.
Apache License 2.0
2 stars 2 forks source link

Ploidy Discrepancy #7

Open jvhagey opened 2 years ago

jvhagey commented 2 years ago

Hi,

I am using snvphyl-galaxy-cli/1.3.0 and noticed a possible discrepancy in the ploidy argument for freebayes. I run the pipeline with:

snvphyl.py --galaxy-url http://snvphyl-galaxy.location.gov --galaxy-api-key random_stuff --fastq-dir ./FASTQs --reference-file "reference.fasta" --output-dir ./output --relative-snv-abundance 0.75 --min-coverage 10 --min-mean-mapping 30 --filter-density-threshold 2 --filter-density-window 11 --workflow-id "f2db41e1fa331b3e"

I changed the key and url for obvious reasons, but this is the general way we run it. When I review the results and look at the FreeBayes job details I see this:

image

If you take a look at the red boxes it looks the workflow has ploidy set to 1, but the argument --ploidy 1 is not showing up in the command line execution. As the default in FreeBayes is ploidy set to 2 this seems like an issue. Have you seen this before?

apetkau commented 2 years ago

Hello @jvhagey

Thanks so much for pointing this out. I had never noticed this before (I never compared the parameters in Galaxy to the command-line being run). I think you are right, that without the --ploidy 1 parameter passed to FreeBayes it must be defaulting to 2.

SNVPhyl itself has gone through a number of updates over the years, and I unfortunately have fallen behind in updating snvphyl-galaxy-cli and the SNVPhyl documentation. snvphyl-galaxy-cli works by running a Docker container which has Galaxy and a version of all the SNVPhyl galaxy tools + a workflow installed. The SNVPhyl workflow version is given as 1.0.1. However, the latest version of the SNVPhyl workflow is 1.2.3. But I have not updated snvphyl-galaxy-cli to reflect this.

The biggest changes between SNVPhyl 1.0.1 and 1.2.3 is the migration of all tools away from the IRIDA version of the Galaxy toolshed (on irida.corefacility.ca, which no longer exists) to the Main Galaxy toolshed (https://toolshed.g2.bx.psu.edu/view/nml/suite_snvphyl_1_2_3/bc72925159fc). As part of this migration I also updated many of the dependency tools for SNVPhyl, including freebayes.

So, if I examine what the latest version of SNVPhyl does in freebayes (freebayes version 1.3.1 -https://toolshed.g2.bx.psu.edu/view/devteam/freebayes/ef2c525bd8cd) I see the following:

image

In other words, it looks like the newer freebayes Galaxy tool is properly setting --ploidy 1 in the command-line.

Unfortunately, the main location I have SNVPhyl 1.2.3 documented is in the IRIDA-related documentation: https://phac-nml.github.io/irida-documentation/administrator/galaxy/pipelines/phylogenomics/

However, there is a Docker image with both Galaxy and the latest SNVPhyl tools available at https://hub.docker.com/r/phacnml/galaxy-irida-20.09/tags. To make use of it you can run (using similar instructions as https://github.com/bgruening/docker-galaxy-stable but switching the Docker container):

docker run -d -p 8080:80 -p 8021:21 -p 8022:22 phacnml/galaxy-irida-20.09:latest

This would start it Galaxy at http://localhost:8080

You could then upload the SNVPhyl Galaxy workflow 1.2.3 found here https://github.com/phac-nml/irida/blob/development/src/main/resources/ca/corefacility/bioinformatics/irida/model/workflow/analysis/type/workflows/SNVPhyl/1.2.3/irida_workflow_structure.ga

However, realistically, I should probably update snvphyl-galaxy-cli to just automatically do all this instead.

Thanks again for pointing out this issue.