theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
36 stars 17 forks source link

Make Plasmidfinder optional, on by default #280

Closed michellescribner closed 5 months ago

michellescribner commented 8 months ago

:bug:

A collaborator reported repeated TheiaProk failures for Enterococcus samples at the plasmidfinder task: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/6d7a5682-0e30-44b7-8d66-8b9eac45bbf0

Cause of the issue is not clear to me. The same assembly run through the PlasmidFinder webtool successfully finished and detected plasmids but took several days to run.

:pencil: Describe the Issue

:repeat: How to Reproduce

:fishing_pole_and_fish: Expected Behavior

:floppy_disk: Version Information

:information_source: Additional Information

kapsakcj commented 8 months ago

Never seen this before. Looks like an issue with plasmidfinder code itself. Here's the relevant error:

Traceback (most recent call last):
File "/plasmidfinder/plasmidfinder.py", line 360, in <module>
min_cov, threshold, method_path, cut_off=False)
File "/usr/local/lib/python3.5/dist-packages/cgecore/blaster/blaster.py", line 242, in __init__
hit)
File "/usr/local/lib/python3.5/dist-packages/cgecore/blaster/blaster.py", line 508, in calculate_new_length
new_start = int(gene_results[split]['sbjct_start'])
KeyError: 'contig00053 len=17204 cov=50.7 corr=0 origname=Contig_45_50.6781_pilon sw=shovill-skesa/1.1.0 date=20231218:7898..8044:rep1_9_repE(pKL0018)_AB290882:9.456740'
No plasmids detected in database

Perhaps you could try switching to spades as the assembly algorithm? FASTA headers will be different, so maybe that will help? 🤷

michellescribner commented 8 months ago

Good call, I just gave that a whirl: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/7a0e8786-bb73-4874-94ac-017acdaea0f3

I was thinking that the plasmidfinder web tool would fail if it was a fasta header issue but I'm not sure if the web tool is using the same exact software/db as our implementation

kapsakcj commented 8 months ago

FYI I think it's safe to assume that the version of plasmidfinder on the webtool is more recent than the one used in our docker image.

If this is an old bug that has already been addressed in the plasmidfinder code, then it would be a good idea to update the docker image to the most recent version and subsequently update the WDL to use the new one.

michellescribner commented 8 months ago

In the absence of a more recent version of plasmidfinder, in the meantime we can add a boolean to TheiaProk to turn off plasmidfinder.

For example: call_plasmidfinder = true