2x slower with k=18 - Githubissues

karel-brinda commented 6 months ago

I've done some first a bit complex benchmark for one of my DBs for diagnostics of resistance (~661 executions of ProphAsm).

ProphAsm2 is really fast!!!

However, I noticed that from some reason, with k=18 it's approx 2x slower compared to k=31. Is there any reason for this? With ProphAsm 1, I didn't observe this.

ProphAsm, k18

real    7m54.141s
user    43m45.501s
sys 1m28.739s

ProphAsm, k31

real    8m29.515s
user    48m24.024s
sys 1m25.909s

ProphAsm 2, k18

real    4m35.686s
user    21m11.361s
sys 1m44.798s

ProphAsm 2, k31

real    2m51.831s
user    11m32.846s
sys 1m25.608s

PavelVesely commented 6 months ago

However, I noticed that from some reason, with k=18 it's approx 2x slower compared to k=31. Is there any reason for this?

Interesting. My guess is that this is because of the dBG structure (much more branching / disconnected for $k=18$). Actually what is the dataset you're running it on? You mention it's a DB for diagnostics of resistance, but it's more like a pangenome of single species or even a more more broad collection of genomes?

PavelVesely commented 6 months ago

Would be interesting to see the number of simplitigs for $k=18$ and $k=31$. Not sure what else could cause such a big difference...

karel-brinda commented 6 months ago

This doesn’t explain. In case of ProphAsm 1 there was no such effect. Also,we dont construct dbg explicitely so dbg topology should play no role.

Maybe an effect of deletions in khash? This could in theory explain that…. (If answers for previously deleted kmers are slow)

Should be reproducible with two pneumococcal genomes, while computing intersection and both sym diffs

On Thu, Mar 14, 2024 at 5:36 PM Pavel Vesely @.***> wrote:

Would be interesting to see the number of simplitigs for $k=18$ and $k=31$. Not sure what else could cause such a big difference...

— Reply to this email directly, view it on GitHub https://github.com/prophyle/prophasm2/issues/12#issuecomment-1997868273, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACC7KJR7PGFZT7JJNNNFQJTYYHGXDAVCNFSM6AAAAABEV5OGTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXHA3DQMRXGM . You are receiving this because you authored the thread.Message ID: @.***>

prophyle / prophasm2

2x slower with k=18 #12