merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

BLASTP --max-target-seqs option randomly choose hits based on sequence order in the fasta? #994

Closed novitch closed 5 years ago

novitch commented 5 years ago

Someone send me this paper today https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/bty833/5106166 , explaining that max-target-seq = 1 option do not select best hit, but the first sequence that matched.

and I was wondering if the blastp analysis inside anvi-pangenome analysis is impacted

meren commented 5 years ago

Thank you very much for bringing this to our attention, Alban!

We will do some benchmarks. It is very likely it will influence results, but I think it will be very unlikely to have a major impact (because the major influence will take place at most noisy parts of the pangenome if it makes any sense). We have been using pangenomics in various environments, and have been carefully inspecting our gene clusters to make sure they are reasonable proxies to biological insights. But we will investigate this and do our best to address this (perhaps by adding a flag to tell anvi'o to be very stringent optionally), and update our tutorials.

Best,

novitch commented 5 years ago

Yes I agree, I was using this option since a long time and observing logical biological interpretations. But I was afraid when reading this paper this morning, I also will test my datasets.

Cheers,

meren commented 5 years ago

Please keep us posted. We will do the same using this issue.

meren commented 5 years ago

No worries.

It turns out, pangenomic analyses do not use this flag :)

COG searches do. I will look into that in a separate issue.

novitch commented 5 years ago

Ok good news!