wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
273 stars 30 forks source link

--contig-end-exclusion doesn't work with -m not set to mean #163

Open alexcritschristoph opened 1 year ago

alexcritschristoph commented 1 year ago

Hi Ben - big fan / user of coverM here. Recently I uncovered this issue with v0.6.1:

When I run

coverm contig --contig-end-exclusion 1000 --bam-files ./test/*.bam --output-format sparse -o test1.tsv --no-zeros -m mean vs coverm contig --contig-end-exclusion 0 --bam-files ./test/*.bam --output-format sparse -o test1.tsv --no-zeros -m mean

Different results are obtained consistent with the --contig-end-exclusion parameter working.

But when I run:

coverm contig --contig-end-exclusion 0 --bam-files ./test/*.bam --output-format sparse -o test1.tsv --no-zeros -m count

vs

coverm contig --contig-end-exclusion 1000 --bam-files ./test/*.bam --output-format sparse -o test1.tsv --no-zeros -m count

I get the exact same results, indicating to me that the contig end exclusion parameter is not working. The same is true when -m is set to covered_bases or covered_fraction. I think this is a bug.

wwood commented 1 year ago

Thanks Alex, much appreciated for the bug and the kind words.

I think this is really an issue with count, not with non-mean methods, agree?

What would be a good definition that accounts for reads that cross the boundary? Starting position for start of contig and end for end of contig?

alexcritschristoph commented 1 year ago

Hi Ben, So, I see this issue with -m covered_bases and -m covered_fraction in my data as well, do you see that as well?

I'm actually not sure I follow your second question - my guess would be best for the parameter to be a hard cutoff, so that any read that crosses the boundary (e.g. the 100 bp from the edge by default) at all is not counted.