theiagen / public_health_bacterial_genomics

GNU Affero General Public License v3.0
26 stars 14 forks source link

ideas for improving read screen workflow (pre-assembly) #199

Closed kapsakcj closed 1 year ago

kapsakcj commented 1 year ago

Just wanted to jot down some ideas for improving the read screen workflow for TheiaProk_Illumina_PE and SE workflows.

  1. exposing the estimated coverage calculated by mash. Output string should be:
FAIL; the estimated coverage ${estimated_coverage} is less than the minimum of ~{min_coverage}x

It currently does not expose the calculated value, so the user cannot determine what the estimated coverage value is https://github.com/theiagen/public_health_bacterial_genomics/blob/main/tasks/quality_control/task_screen.wdl#L119

  1. Additionally we should change line 118 to use -lt instead of -le to prevent rounding from causing failures.

I observed 2 samples that had an estimated coverage of 10.115, and despite the min_coverage being set to 10, the sample failed the read_screen task. I would have expected it to pass in this situation.

https://github.com/theiagen/public_health_bacterial_genomics/blob/870ae7f6ccfa3bfa541ecace5bd26231f0358bac/tasks/quality_control/task_screen.wdl#L118

Hopefully these small changes will lead to more transparency when samples fail read_screen, and better handle situations where a sample has estimated coverage levels around the defined threshold.