Closed kapsakcj closed 2 weeks ago
OK PR is ready for review.
I'll update the documentation and check the above box when I'm finished.
@sage-wright FYI there was one additional commit made after you forked your branch, you may want to merge that in if you are working on theiacov_ont workflow
docs have been updated 👍 (mostly updates to theiacov inputs and outputs table)
I relaunched Sage's tests since they were run prior to the last 2 commits. Same samples, call caching off, both run on the cjk-irma-diskcheck
branch
⚠️ Need to review outputs, esp. the sample that previously had a -
in the output HA segment FASTA file and all segment FASTA file
Code changes look solid! 🏅
New outputs present as expected:
Launching a new set of tests as I don't have access to Sage's workspace:
My tests were successful and the workflow is working as expected. @jrotieno you're the main reviewer here but you got my okay!
Starting a draft while doing testing in Terra. Will update this message periodicallyThis PR closes #412 closes #437 and closes #457
🗑️ This dev branch should be deleted after merging to main.
:brain: Aim, Context and Functionality
The aim of this PR is to resolve a few different issues/bugs and make general improvements & upgrades to the TheiaCoV workflows for Flu analysis. Much of these changes impact the IRMA task, used in TheiaCoV_Illumina_PE wf, but some changes impact other workflows like TheiaCoV_ONT for Flu analysis.
:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made
This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes
Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : No
:clipboard: Workflow/Task Step Changes
🔄 Data Processing
DEBUG
statements throughoutirma_config.sh
config file by setting$TMP
used by IRMA to the present working directory. This impacted samples where FASTQ files were large and is no longer an issueDocker/software or software versions changed:
cdcgov/irma:v1.1.3
➡️"us-docker.pkg.dev/general-theiagen/cdcgov/irma:v1.1.5"
Databases or database versions changed: N/A
Data processing/commands changed: lots of changes to IRMA task & data processing
mafft --thread
flag and updated how version is captured--external-config irma_config.sh
toIRMA
command to use custom config file created in beginning of task.File processing changed: lots of changes to output files & file renaming & FASTA header replacement
FASTA with all segments now contains headers with
~{samplename}
Fix for empty HA segment for Flu B (Issue #437) on line 103. Adjusted
sed
command to modify file in place instead of redirecting into file.Compute resources changed:
4
irma_config.sh
file created in task to include 2 variables for multi-threading IRMA➡️ Inputs
⬅️ Outputs
New IRMA task outputs:
seg_np_assembly
&seq_ns_assembly
which are the FASTA files corresponding with the NP (Nucleoprotein) and NS (nonstructural) segmentString irma_docker
String irma_subtype_notes
which is either:IRMA does not differentiate Victoria and Yamagata Flu B lineages. See abricate_flu_subtype output column
for Flu B samples and blank/empty for non Flu B samples.String irma_subtype
will either sayH1N1
(or whatever Flu A subtype) for example with Flu A and for Flu B it will sayNo subtype predicted by IRMA
.
have been replaced withN
's.New TheiaCoV_Illumina_PE & ONT outputs:
String irma_docker = irma.irma_docker
String? irma_subtype_notes = irma.irma_subtype_notes
File? irma_ha_segment_fasta = irma.seg_ha_assembly
File? irma_na_segment_fasta = irma.seg_na_assembly
File? irma_pa_segment_fasta = irma.seg_pa_assembly
File? irma_pb1_segment_fasta = irma.seg_pb1_assembly
File? irma_pb2_segment_fasta = irma.seg_pb2_assembly
File? irma_mp_segment_fasta = irma.seg_mp_assembly
File? irma_np_segment_fasta = irma.seg_np_assembly
File? irma_ns_segment_fasta = irma.seg_ns_assembly
⚠️ NOTE: I have opted to NOT output the padded FASTA files to the workflow level (i.e. Terra output column) as it would add lots of clutter and likely confuse the user. They are saved as intermediate files during the IRMA task and can be retrieved from the execution directory if necessary.
:test_tube: Testing
Test Dataset
Described below.
Commandline Testing with MiniWDL or Cromwell (optional)
tested lots with miniwdl locally, but don't have output saved.
Terra Testing
⚠️ Need to review outputs, especially the all-segment FASTA files & the HA segment FASTA in particular⚠️ need to review outputs, especially the all-segment FASTA files & the HA segment FASTA in particularSuggested Scenarios for Reviewer to Test
Would be good to test as many types/subtypes as possible and even test with poor quality data to see how the workflow behaves when IRMA cannot produce an assembly.
Need to test ONT as I've primarily been testing the IRMA WDL task updates with Illumina data. I tested 5 ONT samples, but would be good to do more if data is available
Theiagen Version Release Testing (optional)
:microscope: Final Developer Checklist
🎯 Reviewer Checklist
🗂️ Associated Documentation (to be completed by Theiagen developer)