Figure out a way to run clustering algorithms on the data that includes the non-unqiue sequences (optimization problem)

The code is just too slow when the non-unique data is included. (Computing the distance matrix takes too long).

Consider writing a smart function for computing the distance matrix that takes advantage of the fact that identical sequences will have identical distances to other sequences.

Or search for a clustering approach or phylogenetic technique that can be used on larger datasets.

philliplab / ViralHaplotyper

Figure out a way to run clustering algorithms on the data that includes the non-unqiue sequences (optimization problem) #12