wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
297 stars 30 forks source link

Mean coverage calculation #209

Closed ramiroricardo closed 5 months ago

ramiroricardo commented 5 months ago

Dear all,

Thanks for a great tool. I have a basic question, regarding the mean coverage calculation. In the example, the following is stated:

"The two reads have 10 and 9 bases aligned exactly, averaged over 1000-2*75 bp (length of contig minus 75bp from each end)."

What is the reason for removing 75bp from each end of the contig?

Best,

Ricardo Ramiro

wwood commented 5 months ago

Hi,

The idea is to avoid there being coverage artefacts. At the ends of contigs it is harder to map reads - at the extreme case of a read only mapping to the last base pair of a contig, it won't be mapped, so the coverage will be artificially reduced.

There may also be repeats at the ends of contigs, which would inappropriately inflate coverage.

This idea was taken from the metabat coverage calculation script, called jgi_.. - sorry I forget the exact name. HTH, ben

ramiroricardo commented 5 months ago

Great, thanks for your quick reply.