shahab-sarmashghi / Skmer

Fast and accurate tool for estimating genomic distances between genome-skims
https://shahab-sarmashghi.github.io/Skmer/
Other
39 stars 8 forks source link

Parallel sketching #17

Closed nikostr closed 2 years ago

nikostr commented 5 years ago

Looking at the documentation of mash sketch it looks as if this is possible to parallelize beyond running one thread per sample by using the -p flag. Would this be desirable? I could probably help implement this in the coming days if that would help. I guess it would basically just mean updating the sketch-call to look like the call to jellyfish?

shahab-sarmashghi commented 5 years ago

As far as I remember, that option was only helpful when mash is run on multiple files so each file is processed within a thread (which Skmer does the same thing), but not using multithreading on a single file. Can you double-check that? If they have added that feature, we should definitely add it to Skmer. Otherwise, it requires re-implementing the whole hashing part to make parallelization possible which is a standalone (and not easy to me) problem.