Closed sage-wright closed 3 months ago
🥇 for adding tests and tweaks to better adhere to the style-guide!
Testing MPOX and SC2 here:
The two failures in ONT data are unrelated to the changes in this PR (And should be removed from the validation dataset!) https://app.terra.bio/#workspaces/cdc-terra-resources/Theiagen_Wright_SC2_Sandbox/job_history/22c272cd-9d35-4d00-bf2d-b98f5928f882
The task runs successfully but the depth coverage is not being captured correctly:
https://job-manager.dsde-prod.broadinstitute.org/jobs/9fc8388d-4c18-4147-b5e2-55183de96ee3
@cimendes Issue resolved!
All fixed! :D
Thank you @sage-wright!
This PR closes #325 and closes #312
🗑️ This dev branch should be deleted after merging to main.
~Currently waiting on the organism parameter PR merging before adding this since it will use that workflow.~
:brain: Aim, Context and Functionality
As TheiaCoV expands to include more organisms, having a WDL task that is hard-coded for a single organism is inefficient if we want to mimic the behavior for other organisms. This PR changes the calculation of breadth of coverage to no longer be hard-coded and is now organism-agnostic. This requires the usage of the organism_parameter logic subworklow and also enables the user to specify the particular regions they want to be listed in the output file by overwriting the default.
Default bed files are currently only provided for mpox and SC2.
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
The same calculations are used but now require a bed file as input to determine the regions of interest. This input bed file is looped through and
samtools depth
is used to determine the percentage of the sites that are above the specified minimum depth.Docker/software or software versions changed:
Databases or database versions changed:
Data processing/commands changed:
File processing changed:
Compute resources changed:
➡️ Inputs
New input:
reference_gene_locations_bed
which indicates that the gene locations should correspond to the same reference file that was used for alignment. By default, this file is provided for SC2 and mpox. The user can use this input file to overwrite the defaults.⬅️ Outputs
The
sc2_all_genes_percent_coverage
file is nowest_percent_gene_coverage_tsv
as it is no longer SC2 specific.:test_tube: Testing
Test Dataset
Commandline Testing with MiniWDL or Cromwell (optional)
Terra Testing
Suggested Scenarios for Reviewer to Test
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)