Closed HegedusB closed 3 years ago
Could you post an example of a FASTA header with "_" that does not appear in the SAM output? It would help us track down the issue.
Non-canonical splice sites are very rare, so implementing Magic-BLAST we decided to err on the side on caution. Magic-BLAST requires much better quality alignments to call a non-canonical splice site. Otherwise it detects a lot of false positive splice sites. In future versions we plan to use genome annotation so that alignment quality restriction can be lifted for know non-canonical splice sites. We also hope to improve our aligner to give better alignments so that we can call non-canonical splice sites with more confidence. Please, let me know if these solutions would not work for you. Would a command-line option lifting alignment quality restriction work for you?
About encoding of strand information, do you mean to report it as "+" or "-" in the 5th column (PAF format), instead of bits in SAM flag? We will look into reporting PAF format or something similar in future releases.
Could you let me know what program requires minimap2 strand encoding? Thanks.
I am sorry for the late answer! I am using a fungi genome from the JGI. The fasta headers of the assembly looks like this (>scaffold_9, >scaffold_90, etc.)
I am using illumina corrected nanopore reads therefore the read quality is good. An option which allows the use of the non-canonical splice site would be great.
I am using the ONT pinfish pipeline. This pipeline works perfectly with the minimap2 but can not recognize the strands information when I am using the magicblast.
Thank you very much for dealing with my problem!
While using magic-blast I have found some additional issues I would like to ask about. First. I observed that the aligner has some problems with the fasta headers. If the reference sequence contains an “_” character it will not appear in the output “sam” file. An another problem what I have observed is with the detection of the non-canonical splice sites. It often misses the detection of the GC-AG; CT-AC splice sites. Finally, I have a feature request regarding strand information encoding. Unfortunately, the current encoding of magic-blast is not recognized by subsequent programs. (They use the standards of minimap2). Would it be possible to include a formatting option to “mimic” minimap2 output from magic-blast?