ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
128 stars 16 forks source link

Adding smaller biological datasets for mini-benchmarks #341

Open ksahlin opened 9 months ago

ksahlin commented 9 months ago

We have our dros, maize, chm, and rye simulated reads and let that guide our development. Only at release stage we are typically running an evaluation on (2-3) bio datasets.

What about including a few smaller BIO datasets already at development stage? For example, small PE testsets of 75nt, 150nt, and 250nt PE reads from human? For example, the 150nt and 250nt found here. As there is no ground truth, perhaps some slow aligner in very sensitive mode could be used (BLA(S)T?)? We would also get feedback on how rescue etc changes for BIO reads.