smarco / WFA2-lib

WFA-lib: Wavefront alignment algorithm library v2
Other
162 stars 36 forks source link

[WFA::Compute] Maximum allocated wavefronts reached #39

Closed RolandFaure closed 1 year ago

RolandFaure commented 1 year ago

Hi,

Thank you for developing WFA, it is great! I am using the C++ version in my program, and I am stumbling upon an error: [WFA::Compute] Maximum allocated wavefronts reached This error only arrives after a high number of alignments, and does not arrive at all in some instances. Here is how I am using WFA in the code: wfa::WFAlignerGapAffine aligner(8,6,2, wfa::WFAligner::Alignment, wfa::WFAligner::MemoryHigh); aligner.alignEnd2End(cons, re); This code snippet is included in a loop, where cons and re change. The error arrives in a deterministic fashion when going through the loop for the ~720th time.

The error does not arrive when I reuse the same WFAlignerGapAffine object for all alignments. I know this would be the recommended way to go, but I found it uses a lot of RAM (in my case >30G while <1G when resetting the object every time).

Any idea on how to solve this ?

Many tanks, Roland

smarco commented 1 year ago

Hi,

Thanks for using WFA. This clearly looks like a bug. Is it possible to have (1) the file of sequences and (2) .cpp code that causes this issue (i.e., to reproduce the problem)?

Thanks,

RolandFaure commented 1 year ago

Hi, I've been trying to reproduce the issue for one hour, but today it just seems to work :sweat_smile:. I've dumbly not kept the dataset where WFA bugged, because I thought the bug would be more reproducible. So I propose to close the issue for now, and as soon as it starts bugging again, I'll carefully keep the dataset and reopen the issue. Thanks and sorry for the bother,

smarco commented 1 year ago

No problem at all. Perhaps you can share the few lines of code that allocate/loop-align/free (just to double-check). In any case, let me know.

RolandFaure commented 1 year ago

Here is how it is used:

wfa::WFAlignerGapAffine aligner(4,6,2, wfa::WFAligner::Alignment, wfa::WFAligner::MemoryHigh);
aligner.alignEnd2End(cons, re);
alignment = aligner.getAlignmentCigar();

Then the loop finishes, so I did not explicitely free the memory (maybe I should ?)

smarco commented 1 year ago

Can I have a look at the surrounding loop (I want to see the loop). Thanks!

RolandFaure commented 1 year ago
for (auto n = 0 ; n < polishingReads.size() ; n++){

        string alignment;
        if (CIGARs.size() != polishingReads.size()){ //compute the exact alignment if it is not already computed
            string cons = consensus.substr(positionOfReads[n].first, positionOfReads[n].second-positionOfReads[n].first).c_str();
            string re = polishingReads[n].c_str();

            wfa::WFAlignerGapAffine aligner(4,6,2, wfa::WFAligner::Alignment, wfa::WFAligner::MemoryHigh);
            aligner.alignEnd2End(cons, re);

            alignment = aligner.getAlignmentCigar();
        }
        else{
            alignment = convert_cigar(CIGARs[n]);
        }

        totalLengthOfAlignment += alignment.size();
        //a few manipulation with the string alignment to store the result in a vector

}
smarco commented 1 year ago

Thanks.

Looking at the code, I would strongly recommend reusing the aligner object within the loop. Otherwise, each iteration invests a non-negligible amount of time in initializing the aligner object and its internal data structures (it was designed to be reused, the initialization is not a fast procedure). Moreover, this code may be stressing the memory allocators/deallocators as each aligner object created (per iteration) has to be deallocated at the end of the iteration.

I know that you are aware of all these. I just wanted to make sure we are on the same page.

RolandFaure commented 1 year ago

Yes, I am aware of this. There is actually a version of the same code where I re-used an aligner object. The problem was that peak RAM usage went up dramatically (like I said in my first comment, on an instance 1G -> 30G). The loop can be pretty long. Would there be a way to "reset" the alignment object every n iteration of the loop ?

smarco commented 1 year ago

Indeed, function wavefront_aligner_reap should do the trick. Let me know,

RolandFaure commented 1 year ago

Hi ! I just stumbled on the error again, but now it is quite clear : the [WFA::Compute] Maximum allocated wavefronts reached arrives when trying to align an empty sequence. (don't know if it's actually exactly the same bug)

smarco commented 1 year ago

Great timing! I'm working on this code. Is it possible to have (1) the file of sequences and (2) .cpp code that causes this issue (i.e., to reproduce the problem)? Is just a single empty sequence?

RolandFaure commented 1 year ago

It is very simple:

wfa::WFAlignerGapAffine aligner(4,6,2, wfa::WFAligner::Alignment, wfa::WFAligner::MemoryHigh);
string a = "";
for (int i = 0 ; i < 10000 ; i++){
    a += "A";
}
string b = "";
aligner.alignEnd2End(a,b);

Only bugs when a is long enough.

smarco commented 1 year ago

Sorry for the delay. I tried to reproduce to the problem to no avail. Can you let me know if my small test does break using the current development branch? Your feedback is greatly appreciated.

#include <iostream>
#include <string>

#include "../bindings/cpp/WFAligner.hpp"

int main()
{
    wfa::WFAlignerGapAffine aligner(4,6,2,wfa::WFAligner::Alignment,wfa::WFAligner::MemoryHigh);
    std::string a = "";
    for (int i = 0 ; i < 10000 ; i++) {
        a += "A";
    }
    std::string b = "";
    aligner.alignEnd2End(a,b);

    std::cout << "CIGAR:" << aligner.getCIGARString(false) << std::endl;
}

Thank you in advance.

RolandFaure commented 1 year ago

Hi, Sorry for the delay, I had an article deadline. I tried it with the the current development branch, and indeed I could not break it. I will use this now. Thank you so much !