ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
150 stars 17 forks source link

Start indexing while still reading the reference #436

Open marcelm opened 2 months ago

marcelm commented 2 months ago

I have incorporated strobealign into a pipeline and was looking at this log output:

This is strobealign 0.13.0
Estimated read length: 151 bp
Time reading reference: 8.06 s
Reference size: 3099.92 Mbp (195 contigs; largest: 248.96 Mbp)
Indexing ...
  Time counting seeds: 6.70 s
  Time generating seeds: 15.56 s
  Time sorting seeds: 13.83 s
  Time generating hash table index: 7.88 s
Total time indexing: 43.98 s

Reading the reference was quite slow, probably because this was run on a freshly allocated cluster node where the reference was not in the filesystem cache. In cases like these, it would save a couple of seconds if we started indexing the reference while we are still reading it.