Ground truth is only a guess

Our currently pipeline requires mapping contigs back to the reference genomes. For hard clustering we choose a winner based on alignment extent, which is ultimately a guess.

For communities with low phylogenetic distance, this guess is poor and therefore metrics which compare solutions against the ground truth are unreliable.

Soft-clustering is obviously the approach required.

koadman / proxigenomics

Ground truth is only a guess #9