smarco / BiWFA-paper

Bidirectional WFA (Paper)
Other
40 stars 3 forks source link

Memory issues with BiWFA for processing millions of paired alignments #11

Closed GSbioinfo closed 2 months ago

GSbioinfo commented 2 months ago

Hi BiWFA team,

Thank you for making this fast and efficient library available for everyone. I am trying to implement the library in my project and has been successful for small data with 2-3 million read but when I am using this for large data with >5 millions of reads it through out segmentation fault. I have 128GB ram on the system. After testing different 'attributes.memory_mode' setting I found out that the BiWFA runs out of memory after processing certain number of runs and through out segmentation fault error.

Project I am working on involves doing pairwise comparison of n millions of DNA queries ( illumina reads) to m different reference sequences (amplicons). I am calling function

std::string nw_function(std::string refseq, std::string query){ char pattern; char text; pattern = &refseq[0]; text = &query[0]; // Configure alignment attributes wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default; attributes.distance_metric = gap_affine; attributes.alignment_form.span = alignment_end2end;// alignment_end2end; attributes.affine_penalties.match = 0; attributes.affine_penalties.mismatch = 4; attributes.affine_penalties.gap_opening = 20; attributes.affine_penalties.gap_extension = 2; attributes.memory_mode = wavefront_memory_ultralow; // Initialize Wavefront Aligner wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes); // Align wavefront_bialign(wf_aligner,pattern,refseq.length(),text,refseq.length()); std::string mycig = get_cigar_string(wf_aligner->cigar,true); // Free wavefront_aligner_delete(wf_aligner); return mycig; }

I tried using your WFA library and encountered similar issues with much lower read processing capacity. For this reason I moved to your BiWFA library which significantly improved the read capacity but not enough to solve the problem. I was hoping if you could help identify solution to the problem I am facing. Can you give some idea about what parameters I would need to modify so BiWFA does not run out of memory. Greatly appreciate your help.

AndreaGuarracino commented 2 months ago

Hi @GSbioinfo, I suggest opening an issue on the WFA2-lib repository, as it is the best and most up-to-date implementation we have on all WFA-related algorithms. The implementation of BiWFA-paper is quite old and different from the newer one.