smarco / WFA2-lib

WFA-lib: Wavefront alignment algorithm library v2
Other
157 stars 35 forks source link

Segmentation fault (core dumped) #1

Closed haowenz closed 2 years ago

haowenz commented 2 years ago

Hi,

I used the following code to align several pairs of sequences and got segfault on one pair of sequences. The sequences are attached. Please help. Thanks!

  // Configure alignment attributes
  wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
  attributes.distance_metric = edit;
  attributes.alignment_scope = compute_score;
  attributes.alignment_form.span = alignment_endsfree;
  attributes.alignment_form.pattern_begin_free = 0;
  attributes.alignment_form.pattern_end_free = 0;
  attributes.alignment_form.text_begin_free = 0;
  attributes.alignment_form.text_end_free = (text_length < pattern_length ? ext_length : pattern_length) / 2;
  // Initialize Wavefront Aligner
  wavefront_aligner_t* const wf_aligner = wavefront_aligner_new(&attributes);
  // Align
  wavefront_align(wf_aligner,pattern,strlen(pattern),text,strlen(text));
  fprintf(stderr,"WFA-Alignment returns score %d\n",wf_aligner->cigar.score);
  // Free
  wavefront_aligner_delete(wf_aligner);

sequences.zip

smarco commented 2 years ago

Hi,

Thanks for reporting, I really appreciate it. I used the align_benchmar and I cannot spot the problem (see attached the shape of the alignment). Can you share the program *.c you execute to help you better?

ooo 001 bug?

haowenz commented 2 years ago

Hi! Thanks for your fast response. I attached the code I used, which was modified from wfa_basic to take fasta inputs. It is straightforward. You can have a look. code.zip

Btw, is there any code in wfa allows me to input fasta files and align them? Or I have to generate the SEQ file you mentioned in the README? The plots you showed are also pretty. Is there any thing in the document that describes how to generate them?

smarco commented 2 years ago

Hi, again.

Should be fixed now (thanks for the report, this helps me a lot).

Btw, is there any code in wfa allows me to input fasta files and align them? Or I have to generate the SEQ file you mentioned in the README?

Not for the moment. It was thought of as a library. But many people have asked for a FASTA/FASTQ aligner so, in the future, I will implement this feature.

The plots you showed are also pretty. Is there anything in the document that describes how to generate them? I'm working on documenting all these features.

Thanks again. Let me know how it works.

haowenz commented 2 years ago

Thank you! I will test it. It doesn't matter if you have a fasta parser or not. It is easy for developers like me to use my own fasta/fastq parser.

I have one more question on using the library. In the document, I found the following.

// Right extension
wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.alignment_form.span = alignment_endsfree;
attributes.alignment_form.pattern_begin_free = 0;
attributes.alignment_form.pattern_end_free = pattern_end_free;
attributes.alignment_form.text_begin_free = 0;
attributes.alignment_form.text_end_free = text_end_free;

PATTERN    AATTTAAGTCTG-CTACTTTCACGCA-GCT----------
           ||||| |||||| ||||||||||| | | |          
TEXT       AATTTCAGTCTGGCTACTTTCACGTACGATGACAGACTCT

However, how to choose pattern_end_free and text_end_free is not mentioned. I spent some time on this and decide to set them to the min length of text or pattern. Is this right? It would be also useful to have more in the document about this. Thanks again!

haowenz commented 2 years ago

I tested it and found the results different than those generated using edlib with parameters as -m SHW.

The ed reported by edlib is 6454, while for wfa2, it was -11089. I was planning to do right extension and the expected ed should be around 6450.

smarco commented 2 years ago

Yes, you select the lengths pattern_end_free and text_end_free attending to your needs.

If the pattern and text don't have the same length, I would assume that there is a gap somewhere in the alignment. Thus, I would allow ends-free for |text_length-pattern_length| bases at the beginning and at the end of the largest sequence. But this is up to the experiment requirements.

The ed reported by edlib is 6454, while for wfa2, it was -11089. I was planning to do right extension and the expected ed should be around 6450.

Aligning your sequences, using your code, I got a : WFA-Alignment returns score -6458 It is negative because match-score=0 and the rest are penalties). How do you get the -11089?

haowenz commented 2 years ago

I used text as pattern and pattern as text, and then I got -6458. But this is not what I expected when using the code. Now I am confused about pattern_end_free and text_end_free. Could you explain more about these two?

smarco commented 2 years ago

Hi, again.

I tried >Pattern\n<Text and >Text\n<Pattern. It was a bug on the heuristic "wfadaptive" used with ends-free alignment. Now it should be solved. Feel free to check and report anything else that you might find.

Cheers,