jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
357 stars 78 forks source link

Having zero variances in the metabat2 depth file leads to less bins #625

Closed fpusan closed 1 year ago

fpusan commented 1 year ago

For each sample, the depth file has two columns named $dataset.bam and $dataset.bam-var.

I think the dataset.bam-var represents the "variance from mean depth along the contig". By looking at their source code it would seem that this is the variance of the coverage in every position (ignoring the start and end of the contigs and maybe other fancy stuff).

In bin_metabat2.pl we just equal the variance to zero.

We just did the following test:

We found out that metabat2 retrieved 4 times less bins when inputting zero variances. This was only one example, but I have also noticed in other cases that metabat2 produced less bins than maxbin and concoct.

metabat2 also provides the option to actually pass a file with no coverage variances (as opposed to passing the normal file with zero variances). This is at least a nicer way of doing things, but still results in less bins.

The impact of this shouldn't be too large, since by default we use other binners apart from metabat2 and DAStool should largely mask the issue, but this is still something to be fixed.

Any chance we could calculate the variances during step 10 so we can pass them to metabat2 later?

fpusan commented 1 year ago

Fixed in 15aadbb