theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

upgrade pasty to v1.0.3 #379

Closed kapsakcj closed 3 months ago

kapsakcj commented 4 months ago

This PR closes #232

πŸ—‘οΈ This dev branch should be deleted after merging to main.

:brain: Aim, Context and Functionality

Upgrade to pasty v1.0.3 to avoid rare errors like those shown in issue #232

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

pasty is run as a Pseudomonas aeurginosa-specific typing tool and is part of the Merlin_magic subworkflow. Since this is a task level change, only testing of one TheiaProk workflow is necessary (ILMN PE), but in theory all TheiaProk workflows are impacted since they may run the pasty task.

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : No

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No

:clipboard: Workflow/Task Step Changes

πŸ”„ Data Processing

The only change here is the upgraded docker image

Docker/software or software versions changed: pasty v1.0.2 ➑️ v1.0.3

Databases or database versions changed: none

Data processing/commands changed: none

File processing changed: none

Compute resources changed: none

➑️ Inputs

Upgraded to pasty v1.0.3 - using StaPH-B docker image

⬅️ Outputs

None

:test_tube: Testing

Test Dataset

Testing with one P. aeruginosa sample that failed with pasty v1.0.2.

Commandline Testing with MiniWDL or Cromwell (optional)

Toggle to see successful miniwdl test: ``` $ miniwdl run -v tasks/species_typing/pseudomonas/task_pasty.wdl assembly= pseudomonas_aeruginosa_SRR22827329_contigs.fasta samplename=SRR28827329 2024-03-12 09:42:29.684 miniwdl-run read configuration file :: path: "/home/curtis_kapsak/.config/miniwdl.cfg" 2024-03-12 09:42:30.147 wdl.t:pasty task setup :: name: "pasty", source: "tasks/species_typing/pseudomonas/task_pasty.wdl", line: 3, column: 1, dir: "/home/curtis_kapsak/github/public_health_bioinformatics/20240312_094230_pasty", thread: 140529256838976 2024-03-12 09:42:30.149 miniwdl-run.CallCache call cache miss :: cache_file: "/home/curtis_kapsak/.cache/miniwdl/pasty/lfajdeuari2pjfqvvfhdlbchtu74swyf/gf2gcgf7arli2zpjon2dxjjmtastbtzh.json" 2024-03-12 09:42:30.299 wdl.t:pasty docker swarm resources :: workers: 1, max_cpus: 8, max_mem_bytes: 33647976448, total_cpus: 8, total_mem_bytes: 33647976448 2024-03-12 09:42:30.300 wdl.t:pasty input :: name: "samplename", value: "SRR28827329" 2024-03-12 09:42:30.300 wdl.t:pasty input :: name: "assembly", value: "/mnt/miniwdl_task_container/work/_miniwdl_inputs/0/pseudomonas_aeruginosa_SRR22827329_contigs.fasta" 2024-03-12 09:42:30.301 wdl.t:pasty eval :: name: "min_pident", value: 95 2024-03-12 09:42:30.301 wdl.t:pasty eval :: name: "memory", value: 4 2024-03-12 09:42:30.301 wdl.t:pasty eval :: name: "docker", value: "us-docker.pkg.dev/general-theiagen/staphb/pasty:1.0.3" 2024-03-12 09:42:30.301 wdl.t:pasty eval :: name: "disk_size", value: 100 2024-03-12 09:42:30.301 wdl.t:pasty eval :: name: "min_coverage", value: 95 2024-03-12 09:42:30.302 wdl.t:pasty eval :: name: "cpu", value: 2 2024-03-12 09:42:30.303 wdl.t:pasty effective runtime :: docker: "us-docker.pkg.dev/general-theiagen/staphb/pasty:1.0.3", cpu: 2, memory_reservation: 4000000000, maxRetries: 3, preemptible: 0 2024-03-12 09:42:30.303 wdl.t:pasty ignored runtime settings :: keys: ["disks", "disk"] 2024-03-12 09:42:30.317 wdl.t:pasty docker image :: tag: "us-docker.pkg.dev/general-theiagen/staphb/pasty:1.0.3", id: "sha256:03d9c4812bde8c4f6233885f946bfd0f4fb29b322ef0b146b564f915548fff82", RepoDigest: "staphb/pasty@sha256:d439d1bea5464489d104009eabff9c05530fb0947ae1671ebb3c86bf0c121810" 2024-03-12 09:42:31.087 wdl.t:pasty docker task running :: service: "excnyl29lh", task: "kjmagy2xmw", node: "is3ehuh1yr", message: "started" 2024-03-12 09:42:34.781 wdl.t:pasty docker task complete :: service: "excnyl29lh", task: "kjmagy2xmw", node: "is3ehuh1yr", message: "finished" 2024-03-12 09:42:34.781 wdl.t:pasty docker task exit :: state: "complete", exit_code: 0 2024-03-12 09:42:35.285 wdl.t:pasty command stdout unused; consider output `File cmd_out = stdout()` or redirect command to stderr log >&2 :: stdout_file: "/home/curtis_kapsak/github/public_health_bioinformatics/20240312_094230_pasty/stdout.txt" 2024-03-12 09:42:35.286 wdl.t:pasty output :: name: "pasty_serogroup", value: "O12" 2024-03-12 09:42:35.287 wdl.t:pasty output :: name: "pasty_serogroup_coverage", value: 99.88 2024-03-12 09:42:35.287 wdl.t:pasty output :: name: "pasty_serogroup_fragments", value: 1 2024-03-12 09:42:35.288 wdl.t:pasty output :: name: "pasty_summary_tsv", value: "SRR28827329.tsv" 2024-03-12 09:42:35.289 wdl.t:pasty output :: name: "pasty_blast_hits", value: "SRR28827329.blastn.tsv" 2024-03-12 09:42:35.289 wdl.t:pasty output :: name: "pasty_all_serogroups", value: "SRR28827329.details.tsv" 2024-03-12 09:42:35.290 wdl.t:pasty output :: name: "pasty_version", value: "1.0.3" 2024-03-12 09:42:35.291 wdl.t:pasty output :: name: "pasty_pipeline_date", value: "Tue Mar 12 13:42:30 UTC 2024" 2024-03-12 09:42:35.291 wdl.t:pasty output :: name: "pasty_docker", value: "us-docker.pkg.dev/general-theiagen/staphb/pasty:1.0.3" 2024-03-12 09:42:35.291 wdl.t:pasty output :: name: "pasty_comment", value: "" 2024-03-12 09:42:35.297 wdl.t:pasty done 2024-03-12 09:42:35.297 miniwdl-run.CallCache call cache insert :: cache_file: "/home/curtis_kapsak/.cache/miniwdl/pasty/lfajdeuari2pjfqvvfhdlbchtu74swyf/gf2gcgf7arli2zpjon2dxjjmtastbtzh.json" { "dir": "/home/curtis_kapsak/github/public_health_bioinformatics/20240312_094230_pasty", "outputs": { "pasty.pasty_all_serogroups": "/home/curtis_kapsak/github/public_health_bioinformatics/20240312_094230_pasty/out/pasty_all_serogroups/SRR28827329.details.tsv", "pasty.pasty_blast_hits": "/home/curtis_kapsak/github/public_health_bioinformatics/20240312_094230_pasty/out/pasty_blast_hits/SRR28827329.blastn.tsv", "pasty.pasty_comment": "", "pasty.pasty_docker": "us-docker.pkg.dev/general-theiagen/staphb/pasty:1.0.3", "pasty.pasty_pipeline_date": "Tue Mar 12 13:42:30 UTC 2024", "pasty.pasty_serogroup": "O12", "pasty.pasty_serogroup_coverage": 99.88, "pasty.pasty_serogroup_fragments": 1, "pasty.pasty_summary_tsv": "/home/curtis_kapsak/github/public_health_bioinformatics/20240312_094230_pasty/out/pasty_summary_tsv/SRR28827329.tsv", "pasty.pasty_version": "1.0.3" } } ```

Terra Testing

Terra workflow failure ❌ here: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/2f2e277b-2952-436c-8268-9faee93e8d34

Successful Terra workflow with same sample after upgrade βœ… : https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/a265c1dd-1e8c-4414-bd58-d6bea4138bfc

Suggested Scenarios for Reviewer to Test

Test with as many Pseudomonas aeruginosa samples as you have access to, as this is specific to this species

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

πŸ—‚οΈ Associated Documentation (to be completed by Theiagen developer)

kapsakcj commented 4 months ago

Testing with 4 more P. aeruginosa samples here: https://app.terra.bio/#workspaces/theiagen-validations/curtis-sandbox-theiagen-validations/job_history/eb66f372-ff4e-49b8-8e54-7aaf06e2c79b

I believe these succeeded with pasty v1.0.2, but doesn't hurt to run through again

EDIT: These all succeeded βœ…

michellescribner commented 3 months ago

Testing TheiaProk_FASTA_PHB on 10 Pa eyedrop cluster samples: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/18ad148a-fc02-49aa-a8c3-4401be09867d Testing TheiaProk_FASTA_PHB on 120 ATCC Pseudomonas samples: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/283add0a-d622-4fa2-abdf-ce2ae6a5c443

Run of TheiaProk_FASTA_PHB v1.3.0 on 120 ATCC Pseudomonas samples used as comparison: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/a1869544-a906-45bb-a78d-38ae7b4487f2

michellescribner commented 3 months ago

All function tests above were successful. For the 78 ATCC strains that were Pseudomonas aeruginosa, all serogroup predictions matched between PHBv1.3 and this dev branch.

Launching one more function test using TheiaProk_Illumina_PE_PHB for good measure, then I will approve: https://app.terra.bio/#workspaces/theiagen-validations/Theiagen_Scribner_Sandbox/job_history/a2e20ff8-6008-4547-85a0-55f346bb39b4