rnajena / viralclust

Small pipeline to cluster viral genomes based on their k-mer content. WiP
GNU General Public License v3.0
15 stars 4 forks source link

empty sequences after umap&hdbscan for k=5 #1

Closed klamkiew closed 3 years ago

klamkiew commented 4 years ago

When applied to short sequences (length ~200nt, roughly 2.500 sequences) and using k=5 for UMAP and hdbscan, the centroid sequences are empty.

klamkiew commented 4 years ago

same behaviour for k=7 (default) and k=9