Our currently pipeline requires mapping contigs back to the reference genomes. For hard clustering we choose a winner based on alignment extent, which is ultimately a guess.
For communities with low phylogenetic distance, this guess is poor and therefore metrics which compare solutions against the ground truth are unreliable.
Soft-clustering is obviously the approach required.
Our currently pipeline requires mapping contigs back to the reference genomes. For hard clustering we choose a winner based on alignment extent, which is ultimately a guess.
For communities with low phylogenetic distance, this guess is poor and therefore metrics which compare solutions against the ground truth are unreliable.
Soft-clustering is obviously the approach required.