For each sample, the depth file has two columns named $dataset.bam and $dataset.bam-var.
I think the dataset.bam-var represents the "variance from mean depth along the contig". By looking at their source code it would seem that this is the variance of the coverage in every position (ignoring the start and end of the contigs and maybe other fancy stuff).
Running metabat2 with the variances to run metabat2 including the variances (as calculated by jgi_summarize_bam_contig_depths, following their ReadMe)
Running metabat2 using zero variances
We found out that metabat2 retrieved 4 times less bins when inputting zero variances. This was only one example, but I have also noticed in other cases that metabat2 produced less bins than maxbin and concoct.
metabat2 also provides the option to actually pass a file with no coverage variances (as opposed to passing the normal file with zero variances). This is at least a nicer way of doing things, but still results in less bins.
The impact of this shouldn't be too large, since by default we use other binners apart from metabat2 and DAStool should largely mask the issue, but this is still something to be fixed.
Any chance we could calculate the variances during step 10 so we can pass them to metabat2 later?
For each sample, the depth file has two columns named
$dataset.bam
and$dataset.bam-var
.I think the dataset.bam-var represents the "variance from mean depth along the contig". By looking at their source code it would seem that this is the variance of the coverage in every position (ignoring the start and end of the contigs and maybe other fancy stuff).
In bin_metabat2.pl we just equal the variance to zero.
We just did the following test:
We found out that metabat2 retrieved 4 times less bins when inputting zero variances. This was only one example, but I have also noticed in other cases that metabat2 produced less bins than maxbin and concoct.
metabat2 also provides the option to actually pass a file with no coverage variances (as opposed to passing the normal file with zero variances). This is at least a nicer way of doing things, but still results in less bins.
The impact of this shouldn't be too large, since by default we use other binners apart from metabat2 and DAStool should largely mask the issue, but this is still something to be fixed.
Any chance we could calculate the variances during step 10 so we can pass them to metabat2 later?