ncbi / magicblast

34 stars 16 forks source link

Distant homology searches #48

Closed manu-script closed 1 year ago

manu-script commented 1 year ago

@boratyng Hi Greg, I understand that Magic-BLAST was not built for remote homology searches and the default parameters are tuned for high-similarity searches. But due to the non-overlapping paired-end reads in my data, I cannot use the classic blastn for querying the entire refseq transcript collection. I was wondering if we can change the default reward/penalty, gap open/extend, and alignment score thresholds? Could you please recommend the values that I can try to allow for distant matches up to 60% identity?

Thanks, Manu

boratyng commented 1 year ago

Hi @manu-script,

Magic-BLAST was not made for aligning sequences at 60% identity. As long as you are not aligning reads to genomic sequences, you can try these parameters: -word_size 12 -penalty -2 -lmit_lookup F. I have never tried these alignments, so you may have to experiment with penalty and score threshold. I would start with alignment score threshold -score at about 30% of your read length.

But there are caveats:

It may be easier to use BLASTN and post-process the results.

manu-script commented 1 year ago

Thanks a lot, Greg! I will keep those caveats in mind and try the suggested parameters.