ncbi / magicblast

34 stars 16 forks source link

problems with really short exons (10-6 bp) #15

Open HegedusB opened 4 years ago

HegedusB commented 4 years ago

Dear all, I am surprised how well the program is working. However, I have a question. Is it possible to fine toon the aligner in order to detect smaller exons (10-6 bp). I believe the word size is the limiting factor here. My fungal strain contains a lot of short exons. So far this is the best aligner which can handle most of them correctly, however the really short exons are not really well aligned here neither. Thanks for any helps!

boratyng commented 4 years ago

Thank you for giving Magic-BLAST a try. Unfortunately very short exons are still a weak side of Magic-BLAST. We continue to improve it and expect that Magic-BLAST will get better at this in future versions.

In the mean time, you are correct that word size needs to be reduced to 10 or 6 bases. However, using word size below 16, requires turning off repeat filtering, so you also need to add -limit_lookup F option. This will significantly increase run time and memory footprint.

HegedusB commented 4 years ago

Thank you for the great program. I am waiting the new releases.

y9c commented 3 years ago

Can the latest release (v 1.6.0) turn off repeat filtering now?

boratyng commented 3 years ago

@yech1990, This behavior did not change in version 1.6.0. You can turn off repeat filtering with -limit_lookup F option. It will work for aligning to transcripts, single or a few genes, small genomes (bacteria). I would not recommend it for aligning to a human genome.