Open zeeev opened 7 years ago
Hi, Zev
In short, there is no way to control acceptable insertion / deletion size. Although the X-drop threshold might be used for the purpose setting very small value, it does not seem to be a good choice because it also splits alignments on low-identity regions.
Actually, the detailed answer depends on the length of the deletion.
If it is shorter than 25 bases, the behavior might came from a bug in the alignment routine. I would appreciate if you could show the actual sequence pair that reproduce the case (and options provided to the program).
Unfortunately, if it is longer than 25 bases, the behavior is a limitation of the alignment routine. The program uses a 32-cell fixed wide banded alignment with adaptive steering technique. The algorithm is confirmed by experiment that dropping indels longer than 25bases while capturing perfectly shorter than it. (The line BW = 32, the fourth line from the left, in the Figure 2(d) shows the trend: https://github.com/ocxtal/adaptivebandbench ) Since the reason of the algorithm selection is the good performance and efficiency of the adaptive band algorithm, i'm sorry but the limitation will not be alleviated in the future...😢
Thanks,
Hajime
Hi, Zev
Minialign is now updated to version 0.3.2. In this release some bugs in the chaining routine, which made the chained path collapsed when it reached the head of the query sequence, are fixed. The chaining parameters, side lengths of the parallelogram window, are now modifiable with '-L' and '-H' flags (and the defaults are also enlarged to 5000, in order not to split chain around low-identity regions). I'm glad if you could test this new version.
Thank you.
Hajime
@ocxtal Sorry I didn't reply sooner. Thank you for the updates. I will re-run the alignments after the thanksgiving holiday. What parameters would you suggest for -L and -H to maximized INDEL/SV detection?
Hi, Zev
Recommended -L and -H settings is difficult (since I'm not familiar with indel/SV calling...), hmm...
Currently I believe that large indel detection should be resolved in the postprocess of the local alignment and could be a preprocess of the SV detection program. However, if you say the large indels must be captured in the local alignment stage, I'll consider adding indel detection algorithm (alignment linking and gap filling) as a postprocess of the calculation of the alignment set.
Regards,
Hajime Suzuki
Hi, Zev
Just now I have figured out that the problem is: the extension alignment terminated just before the indels and the following matching regions were not reported...! (I am sorry to be late to understand...😢) I have confirmed the phenomenon on my simulated data and I'll add downstream-rescuing algorithm in the next release.
Thanks,
Hajime
Hi, Zev,
I'm sorry for my delayed reply. I've just pushed the minor update, 0.4.2, with a downstream alignment rescuing algorithm. The algorithm still fails collecting alignments after short indels, it performs much better than the previous release, 0.4.1. Please try it out.
Here are pileups of my test data.
minialign-0.4.1 (default params)
minialign-0.4.2 (default params)
bwa-mem (default params), as a reference
Thanks,
Hajime Suzuki
Greetings,
I've noticed that alignments break over small deletions. Is there a way to control the size of deletion an alignment can contain?
Thank you,
Zev