ksahlin / strobealign

Aligns short reads using dynamic seed size with strobemers
MIT License
144 stars 17 forks source link

Strobealign as reusable library? #39

Open ksahlin opened 2 years ago

ksahlin commented 2 years ago

Adding the request "Please make it a reusable library" from here. Not sure what it entails or mean.

marcelm commented 2 years ago

The idea is likely that other people would not only be able to use StrobeAlign as a command-line tool, but that they can also write their own (C++) programs that use the StrobeAlign algorithms. This would be similar to how we rely on external libraries such as args, doctest, kseq++, StripedSmithWaterman etc.

This requires a couple of changes and may sound like extra work, but the great thing is that nearly all of these changes are needed anyway to make the code more readable, more testable and generally more maintainable.

For example, I would like be able to write a unit test like this for aligning a paired-end read against a reference:

    auto refs = References::from_fasta("tests/phix.fasta");
    auto index = StrobemerIndex(refs);
    auto r1 = KSeq("read", "ACGT", "AAAA");
    auto r2 = KSeq("read", "CCAA", "BBBB");

    StrobemerMapper mapper(index);
    auto mapped = mapper.map_paired(r1, r2);
    auto aligned = mapped.align();
    CHECK(aligned[0].ref_position == 270);
    CHECK(aligned[1].is_unmapped);

Many of the changes I’ve made recently are intended to make the above work (and we’re getting there).

So while my motivation for the moment is to enable testing, the required refactoring forces us to come up with a good API that can then also be used by others. Not sure if we’ll go that far, but StrobeAlign could essentially become the frontend to a libstrobealign. The frontend would only parse the command-line and perhaps set up threads or so, and then it delegates index creation and mapping to the backend (API). This would mirror the relationship between samtools and htslib.

Here are a couple of things that would need to be done. (We can turn them into issues when the time comes.)

These are just some points that come to mind. All of them (except possibly for the very last one), are things I’d recommend doing anyway, so even if we just aim for StrobeAlign to become a reusable library (but never get there), it would be beneficial because we would get better code in the end.

ksahlin commented 2 years ago

Thanks for the detailed answer! Yes, I think this sounds great and worth spending some time on since it is beneficial for the code base regardless.