ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
150 stars 17 forks source link

Fix index construction statistics #434

Closed Itolstoganov closed 1 month ago

Itolstoganov commented 2 months ago

Fixed several issues in the index statistics

ksahlin commented 2 months ago

Hi Ivan,

Thanks - and great that you found this bug!

I approve the PR, but I will let Marcel make the final call.

@marcel, note that auto count = get_count(find(get_hash(it))); is a bit of a redundant call, since it involves two searches. A faster way to do it would be skipping over all seeds with the same hash and increment the counters differently:

            tot_seed_count += count;
            tot_seed_count_sq += count^2;

However, this part of the code is only for printing index statistics, therefore it is not crucial for it to be optimised.

Hence I approve.

marcelm commented 1 month ago

I’ll merge this so that it can be part of the next release. To be honest, I’ve never used print_diagnostics, so it being inefficient doesn’t affect me that much, and I guess it’s rarely used in practice anyway.