theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
33 stars 15 forks source link

Fix Empty Mean Coverage for Flu HA and NA Assemblies #372

Closed jrotieno closed 4 months ago

jrotieno commented 4 months ago

This PR closes #361.

🗑️ This dev branch should be deleted after merging to main.

:brain: Aim, Context and Functionality

This PR fixes an issue with empty mean coverage estimates for influenza HA and Na assemblies.

:hammer_and_wrench: Impacted Workflows/Tasks & Changes Being Made

This will affect the behavior of the workflow(s) even if users don’t change any workflow inputs relative to the last version : Yes

Running this workflow on different occasions could result in different results, e.g. due to use of a live database, "latest" docker image, or stochastic data processing : Yes, but not due to this change

:clipboard: Workflow/Task Step Changes

🔄 Data Processing

Docker/software or software versions changed: No

Databases or database versions changed: No

Data processing/commands changed: Yes, to correctly capture BAM files used to calculate assembly coverage

File processing changed: No

Compute resources changed: No

➡️ Inputs

No changes

⬅️ Outputs

No changes

:test_tube: Testing

Test Dataset

Tested using a set of 11 influenza sample reads, including one that is simulated data.

Commandline Testing with MiniWDL or Cromwell (optional)

Tested the irma task locally and worked as expected.

Terra Testing

https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/ffe75b13-4dde-4c51-af39-bd5215058b0e

Suggested Scenarios for Reviewer to Test

Theiagen Version Release Testing (optional)

:microscope: Final Developer Checklist

🎯 Reviewer Checklist

🗂️ Associated Documentation (to be completed by Theiagen developer)

cimendes commented 4 months ago

@jrotieno Two samples failed on your test dataset that I believe are related to this PR! The irma error is as follows:

Sample_01_HA.fasta Sample_01_NA.fasta mv: 'Sample_01_HA.bam' and 'Sample_01_HA.bam' are the same file mv: 'Sample_01_NA.bam' and 'Sample_01_NA.bam' are the same file

Can you take a look?

I've launched a second test at https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_PR_310/job_history/7e6d16c7-bb81-4a6c-aa02-408d1335427f on 21 FLU samples

cimendes commented 4 months ago

Good news! And bad news!

In some samples, the outputs look as expected: (sample_id, assembly_length, assembly_mean_coverage)

image

In others, only one of the fragments was recovered:

image

And in others, despite the assembly having content and the fragments being recovered by IRMA, the output is still empty:

image image
jrotieno commented 4 months ago

@cimendes the samples without assembly mean coverage were not part of the 21 you ran in https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_PR_310/job_history/7e6d16c7-bb81-4a6c-aa02-408d1335427f. I am running them, and only selecting those with previous assemblies: https://app.terra.bio/#workspaces/theiagen-validations/PHB_Validation_PR_310/job_history/b440f1c8-f6a5-4f1c-9716-756e579cad97

cimendes commented 4 months ago

Status update: Waiting on feedback for the two failures in https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/ffe75b13-4dde-4c51-af39-bd5215058b0e

jrotieno commented 4 months ago

Good to go @cimendes

cimendes commented 4 months ago

Relaunching test with Flu B here: https://app.terra.bio/#workspaces/cdph-terrabio-taborda-manual/Global_tree_testing/job_history/b2ed1f54-b0ae-4f9d-88e7-69801c76d457

cimendes commented 4 months ago

Code changes look good! Will approve and merge as soon as the test is successful :)