🗑️ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
This PR checks for the abundance of the organism input in the TheiaCoV workflows using Kraken2. The organism input is passed onto the readQC_trim workflows that call the Kraken2 task. As the readQC_trim workflows are also called by the TheiaProk workflows, the organism input is optional for the readQC_trim workflows and the Kraken2 task.
For the TheiaProk workflows that do not have an organism input, the workflows will produce the most abundant organism from the Kraken report.
The organism abundance estimation will be performed pre and post human reads removal (dehosting). The pre-host reads removal abundance could be useful in estimating the target organism amplification process and contamination, while the post-host reads removal could be useful in understanding the efficiency and impact of the human read removal process.
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes. There will be new outputs.
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed: N/A
Databases or database versions changed: N/A
Data processing/commands changed: Yes. Additional steps in the Kraken2 task to extract organism abundance estimates.
This PR closes #230.
🗑️ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
This PR checks for the abundance of the
organism
input in the TheiaCoV workflows using Kraken2. The organism input is passed onto the readQC_trim workflows that call the Kraken2 task. As the readQC_trim workflows are also called by the TheiaProk workflows, theorganism
input is optional for the readQC_trim workflows and the Kraken2 task.For the TheiaProk workflows that do not have an
organism
input, the workflows will produce the most abundant organism from the Kraken report.The organism abundance estimation will be performed pre and post human reads removal (dehosting). The pre-host reads removal abundance could be useful in estimating the target organism amplification process and contamination, while the post-host reads removal could be useful in understanding the efficiency and impact of the human read removal process.
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes. There will be new outputs.
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
Docker/software or software versions changed: N/A
Databases or database versions changed: N/A
Data processing/commands changed: Yes. Additional steps in the Kraken2 task to extract organism abundance estimates.
File processing changed: N/A
Compute resources changed: N/A
➡️ Inputs
⬅️ Outputs
Kraken2 task
most_abundant_organism
percent_most_abundant_organism
TheiaCoV_illumina_PE
kraken_most_abundant_organism_raw
kraken_percent_most_abundant_organism_raw
kraken_most_abundant_organism_dehosted
kraken_percent_most_abundant_organism_dehosted
TheiaCoV_ONT
kraken_most_abundant_organism_raw
kraken_percent_most_abundant_organism_raw
kraken_most_abundant_organism_dehosted
kraken_percent_most_abundant_organism_dehosted
readQC_trim_PE
kraken_most_abundant_organism_raw
kraken_percent_most_abundant_organism_raw
kraken_most_abundant_organism_dehosted
kraken_percent_most_abundant_organism_dehosted
readQC_trim_ONT
kraken_most_abundant_organism_raw
kraken_percent_most_abundant_organism_raw
kraken_most_abundant_organism_dehosted
kraken_percent_most_abundant_organism_dehosted
TheiaProk_illumina_PE
kraken_most_abundant_organism_raw
kraken_percent_most_abundant_organism_raw
kraken_most_abundant_organism_dehosted
kraken_percent_most_abundant_organism_dehosted
:test_tube: Testing
Test Dataset
Commandline Testing with MiniWDL or Cromwell (optional)
Terra Testing
TheiaProk_illumina_PE: https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/87c7c951-9f58-43ee-95a4-4e65b90deaf6
TheiaCoV_illumina_PE: https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/9d4da382-51eb-4209-b177-b514445ff2b0
TheiaCoV_ONT: https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/8ca43601-f9b6-4d43-9e0e-8823acb0902c
Additional testing with
kraken_target_org
inputsars-cov-2
and target organism issars-cov-2
, with the latter not being a proper taxonomic name expected to failsars-cov-2
and target organism isinfluenza
, for an all human reads samplesars-cov-2
and target organism isSevere acute respiratory syndrome coronavirus 2
, input reads all SARS-CoV-2, expected to passWNV
and target organism isInfluenza
, input reads are a mixed sample of flu, WNV, HIV and SARS-CoV-2, expected to passhttps://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/3787e23b-2b52-4b94-9a2d-b0d90f3e5a1d
Suggested Scenarios for Reviewer to Test
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)