ncbi / magicblast

34 stars 16 forks source link

Mistakes on small introns #54

Open sunriseTM opened 1 year ago

sunriseTM commented 1 year ago

Hi,magicblast take small introns (peaks at 23 bp) in my genome as deletions, and I cannot find any way to solve it, do you have any great suggestions?

boratyng commented 1 year ago

Hi @sunriseTM, I am sorry you ran into trouble. Can you post an example? It is difficult to say anything without looking at the data.

In general, Magic-BLAST is conservative about intron detection. Anything shorter than 10 bases is always reported as a deletion rather than an intron. Also common splice signals must be present for a an intron to be reported.

sunriseTM commented 1 year ago

Hi,I got a screenshot from IGV, as below: image the upper one is made by aligning PacBio Hifi mRNA reads (Iso-seq) using magicblast, and the one below is Illumina mRNA reads aligning result made by Hisat2. As you can see, the intron defined by Illumina reads was missed by magicblast, which I have encountered using minimap2 before. Except for wrong definition at the correct location, it will also get a deletion-intron shift, which means the wrong location, just as below: image I think it is a common difficulty for current softwares to align long reads to genome and recognize the correct intron structure.

boratyng commented 1 year ago

Thank you for the example. Yes, intron detection is generally more difficult with long reads, because of increased error rate. It looks like Magic-BLAST overextended here on one side and then could not find the splice signals because of this. Unfortunately I cannot offer you any parameter values to fix it right now. I can only offer that we fix this problem in a future release. Would you be able to share the reads that aligned in this region and the genome?