rnajena / viralclust

Small pipeline to cluster viral genomes based on their k-mer content. WiP
GNU General Public License v3.0
15 stars 4 forks source link

RAM vs Diskspace trade off #21

Closed klamkiew closed 1 year ago

klamkiew commented 1 year ago

Using kmer frequencies is really demanding for the memory in the HDBSCAN module. I am thinking of re-implementing this via a (maybe pickled) data dump on the harddrive, that is read into the memory on demand, line-by-line. More I/O, less RAM...