ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
128 stars 16 forks source link

Parameter for hardmasked (removed) seeds #355

Open ksahlin opened 8 months ago

ksahlin commented 8 months ago

Something we brought up on today's discussion:

  1. Having a parameter (-H, default 1000) to the threshold that determines the threshold for ignoring seeds.
  2. Remove all seeds above the threshold from the sorted seed vector (before index vector is built) without increasing peak mem.

I guess 2 could be done by either printing the vector to file and removing them as we are reading it back in, or by iterating over the sorted vector once more and assign seeds above the threshold a specific value, that when resorted, they are placed at the end of the vector. Then remove those elements from the vector (can this be done to also free up the space of those slots?).