philliplab / ViralHaplotyper

0 stars 0 forks source link

Figure out a way to run clustering algorithms on the data that includes the non-unqiue sequences (optimization problem) #12

Open philliplab opened 9 years ago

philliplab commented 9 years ago

The code is just too slow when the non-unique data is included. (Computing the distance matrix takes too long).

Consider writing a smart function for computing the distance matrix that takes advantage of the fact that identical sequences will have identical distances to other sequences.

Or search for a clustering approach or phylogenetic technique that can be used on larger datasets.