Closed mbhall88 closed 3 years ago
People will have easy access to the properly filtered data, they can easily map to h37rv, and will want to remove human.ie go ahead with filtered
I had forgotten that this information is produced by the QC pipeline in https://github.com/mbhall88/head_to_head_pipeline/commit/6e05dc588c9dc1d31e744ef98434632471c83aba
A key component of the drug resistance prediction will be investigating what part (if any) coverage plays in Nanopore prediction ability.
Given we are working with reads we know map to H37Rv only, just using the theoretical coverage (i.e. number of bases divided by genome size) should be sufficient. This value is actually already present in the
rasusa
log files from subsampling in the QC pipeline.One thought though is whether we want to use these "filtered" reads, or whether we want to use the "real" data as this is probably what people will commonly have. I am happy to use the filtered stuff, I just wanted to make sure we addressed this.