sourmash sketch dna -p k=21 C-RVDBvCurrent.fasta.gz -o C-RVDBvCurrent.sig.zip --singleton
didn't finish in 24 hours.
what's the reason!? By my understanding manysketch isn't multithreaded when reading single FASTA files, so it's not multithreading. Presumably just the Python for loop penalty and/or using screed!? Wow.
On a mostly unrelated note, the sig.zip file is larger than the FASTA file. So that sucks.
I'm trying to sketch the RVDB, the Reference Viral Genome Database. The clustered file is ~600 MB.
took about 5 minutes.
didn't finish in 24 hours.
what's the reason!? By my understanding
manysketch
isn't multithreaded when reading single FASTA files, so it's not multithreading. Presumably just the Python for loop penalty and/or using screed!? Wow.On a mostly unrelated note, the sig.zip file is larger than the FASTA file. So that sucks.