smarco / WFA2-lib

WFA-lib: Wavefront alignment algorithm library v2
Other
162 stars 36 forks source link

About how wavefront offsets are saved #94

Closed shenwei356 closed 5 months ago

shenwei356 commented 5 months ago

Hi @smarco ,

I'm trying to understand the source code about how wavefront offsets are saved.

It looks like they are saved in a list,

https://github.com/smarco/WFA2-lib/blob/main/wavefront/wavefront.h#L62

  wf_offset_t* offsets;                // Offsets (k-centered)

which is accessed by the k as the index.

https://github.com/smarco/WFA2-lib/blob/main/wavefront/wavefront_backtrace.c#L73

mwavefront->offsets[k]

However, the k might be negative. How is this possible for a negative list index?

My guess is all k values are added by the length of the query sequence to make them >=0. But I can't find the code and this would occupy a lot of space, especially for long sequences.

In my implementation, to support negative k values, I use the layout below. But it requires bound checking in every reading/writing.

  index: 0,  1,  2,  3,  4,  5,  6
  k:     0, -1,  1, -2,  2, -3,  3

Best, Wei.

shenwei356 commented 5 months ago

My guess is all k values are added by the length of the query sequence to make them >=0

It seems to be true. https://github.com/smarco/WFA2-lib/blob/main/wavefront/wavefront.c#L89

RagnarGrootKoerkamp commented 5 months ago

Yes indeed, my understanding is that mwavefront->offsets is simply a pointer to the middle of a sufficiently large array so it can be indexed by the (possibly negative) diagonal directly.

shenwei356 commented 5 months ago

Thank you Ragnar. I just searched and found C++ does support a negative index !!!! https://stackoverflow.com/questions/47170740/c-negative-array-index

But Go don't allow that. So I'll still use the previous layout but grow the the array less frequently.

smarco commented 5 months ago

Hi,

Yes. I know it can be a little bit weird at first. But, usually, the convention is to index the main diagonal as $k=0$, diagonals above as $k>0$, and below $k<0$. To be able to index negative $k$, I have to shift the mwavefront->offsets to point to the middle of the allocated region of memory.

But this is just a convention, and you may as well map the wavefront's indexes (i.e., diagonals) to any positive range.

Do not hesitate to contact me if you have more questions.

shenwei356 commented 5 months ago

I see, thank you every much!