clarity on min-covered-fraction

itsmisterbrown commented 1 year ago

hi Ben,

I'm sure there's a simple answer here, but I just want to confirm that the behavior I imagine in my head is consistent with the actual behavior. Regarding the --min-covered-fraction flag for contig and genome, this acts on a per-sample basis, correct?

eg. does the example below illustrates the correct behavior?

Consider coverage matrix x, with taxa a, b, and c and with --min-covered-fraction=0 a b c sample 1. 1.2 0.5 0.08 sample 2. 0.09 0.11 0.5 sample 3. 0.05 1.2 7.1

but when applying the default of --min-covered-fraction=10 this would yield

a b c sample 1. 1.2 0.5 0 sample 2. 0 0.11 0.5 sample 3. 0 1.2 7.1

where the coverage for taxon c in sample 1 has been set to 0 and the coverage for taxon a in samples 2 and 3 has also been set to 0.

if this is correct, this would also result in anything using the length coverage estimator (eg. RPKM, TPM) to have those values reported as zero also, correct?

thanks very much!

wwood commented 1 year ago

Hi,

That all sounds about right, with a few clarifications.

Yes it is per-sample (though the default value is different for contigs vs genomes)

But for your example, min-covered-fraction refers to the % of bases covered by any read, not a mean coverage >0.1. There isn't enough info from your first table to work out what will be filtered when --min-covered-fraction=10 is applied. But in spirit, I think you get it.

Yes RPKM and TPM are treated the same as mean and relative_abundance.

HTH

itsmisterbrown commented 1 year ago

super, thanks for the clarification!

wwood / CoverM

clarity on min-covered-fraction #178