theiagen / public_health_bioinformatics

Bioinformatics workflows for genomic characterization, submission preparation, and genomic epidemiology of pathogens of public health concern.
GNU General Public License v3.0
37 stars 17 forks source link

add `percentage_mapped_reads` output to TheiaCoV wfs #507

Open kapsakcj opened 3 months ago

kapsakcj commented 3 months ago

:cool:

:pushpin: Explain the Request

A user requested that the percentage_mapped_reads which appears in TheiaMeta....

theiameta workflow output: https://github.com/theiagen/public_health_bioinformatics/blob/74c12d3bbced7177638fdfdc50c56f8bc40a0f5a/workflows/theiameta/wf_theiameta_illumina_pe.wdl#L266

theiameta task that generates this output: https://github.com/theiagen/public_health_bioinformatics/blob/74c12d3bbced7177638fdfdc50c56f8bc40a0f5a/tasks/utilities/data_handling/task_parse_mapping.wdl#L206

...be added to TheiaCoV workflows. I'm thinking we could add this to iVar based workflows (theiacov_illumina_pe, theiacov_illumina_se) pretty easily by parsing this information out of the samtools flagstat file: https://github.com/theiagen/public_health_bioinformatics/blob/74c12d3bbced7177638fdfdc50c56f8bc40a0f5a/tasks/quality_control/basic_statistics/task_assembly_metrics.wdl#L20

but this approach would limit the new output to ivar based workflows ⚠️

If we wanted to add this output to other workflows, like theiacov on Flu track or TheiaCoV_ONT, we would need to use the assembled_reads_percent WDL task from TheiaMeta and pass in BAM files.

My first suggestion would take less time to develop and would satisfy the request but the downside is that TheiaCoV_ONT, TheiaCoV_illumina_pe + Flu, TheiaCoV_Clearlabs users would lack this output.

Maybe there are other solutions I have not thought of 🤔 ?

We should discuss as a team and decide how to proceed. Also need to consider if we want this in the next release or a later release.

:books: Context

request from CDPH

:chart_with_upwards_trend: Desired Behavior

Add the percentage_mapped_reads output column to TheiaCoV workflows

:information_source: Additional Information