ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
142 stars 17 forks source link

Log index statistics in a nicer way #300

Closed marcelm closed 1 year ago

marcelm commented 1 year ago

This is also part of #298, but I took the opportunity to make the index statistic logging a bit nicer, which I had wanted to do for a while. Before:

Unique strobemers: 23324540
Total strobemers count: 26446802
Total strobemers occur once: 22560298
Fraction Unique: 0.97
Total strobemers highly abundant > 100: 1562
Total strobemers mid abundance (between 2-100): 762681
Total distinct strobemers stored: 23324540

After:

Index statistics
  Total strobemers:          26446802
  Unique strobemers:         23324540
    1 occurrence:            22560298 ( 96.72%)
    2..100 occurrences:        762681 (  3.27%)
    >100 occurrences:            1562 (  0.01%)
marcelm commented 1 year ago

I thought about adding a hard-coded (100%) after "Unique strobemers" to clarify what the other percentages refer to. What do you think? Like this:

Index statistics
  Total strobemers:          26446802
  Unique strobemers:         23324540 (100.00%)
    1 occurrence:            22560298 ( 96.72%)
    2..100 occurrences:        762681 (  3.27%)
    >100 occurrences:            1562 (  0.01%)

But maybe it’s confusing that this is always 100%?

ksahlin commented 1 year ago

I like the clarification.

I thought about adding a hard-coded (100%) after "Unique strobemers"

Okay, but in that case perhaps we should change to Distinct strobemers or Total distinct strobemer hash values instead of Unique?

marcelm commented 1 year ago

Okay, but in that case perhaps we should change to Distinct strobemers or Total distinct strobemer hash values instead of Unique?

Do you suggest this because "unique" could also refer to the strobemers with one occurrence? That’s a good point. I have changed it to "distinct" now. The longer version doesn’t fit well and the index statistics are only shown when -v is used, so I think it’s fine. (We could at some point add a section to the documentation explaining what the numbers mean and how to interpret them.)

ksahlin commented 1 year ago

Do you suggest this because "unique" could also refer to the strobemers with one occurrence?

Yes, that is what I was referring to.

Ok great. good to merge.