mhoban / rainbow_bridge

GNU General Public License v3.0
5 stars 2 forks source link

taxonomy qcov 100 may not (or not always) be returning qcov 100 #49

Open cajwalsh opened 9 months ago

cajwalsh commented 9 months ago

Re: an email from Molly Timmers where she found moray hits that only had qcov 98 online when we specified qcov 100 (and they are written in output as qcov100)

All output files I have say 100% qcov, as do all the defaults from the pipeline (in both the initial blast and the taxonomy assignment). I looked at the blast results file and they say that the two top hits (that also show up in the online blast) have "qcovs" and "qcovshsp" of 100%. I do see online though that these records only have 98% coverage. This implies to me that the part of the pipeline that is supposed to specify only 100% qcov in the blastn search (-qcov_hsp_perc ${params.qcov}) is not working properly. ${params.qcov} is set to 100, so it must be the -qcov_hsp_perc argument? I checked this blastn documentation. Every flag used in the pipeline's blastn call is explained there except this qcov one (it's not referred to except in the outfmt argument) that is supposedly being used to get qcov = 100. I remember looking at this when we first started using the pipeline and not finding it so it's possible I am still missing something, or that it hasn't been doing what I thought it was this whole time.

The blast hits are 3 bases short at the end of the sequence from being 100% coverage, with most other (much lower pid) hits either missing 2-3 bases at the end or just 1 at the start. The only 100% qcov is below 90% pid and is a freshwater moray.

The run is Tribuga MiFish, probably on scylla. Online, the top two blast hits (the only two kept by the pipeline in the blast results file) have 100 in both the qcov columns. Online they are not 100%.

CACCGCGGTTATACGAGAGGCCCAAATTGATCCACTACGGCGTAGAGTGTGATTAGGGATAAACTTAAACTAGAGCCAAACGCCCCCTATGCTGTCATACGTCATGGGGAGCATGAAGCCCAACAACGAAAGTGGCTCTATACAGCTTGAACTCACGACAGCTAAGGTA

-outfmt "6 qseqid sseqid staxid ssciname scomname sskingdom pident length qlen slen mismatch gapopen gaps qstart qend sstart send stitle evalue bitscore qcovs qcovhsp"

Zotu325 gi|1983636109|dbj|LC604995.1| 876639 Gymnothorax saxicola Gymnothorax saxicola Eukaryota 97.041 169 169 586 5 0 0 1 169 112 280 Gymnothorax saxicola CBM:ZF:20850-1 mitochondrial gene for 12S rRNA, partial sequence 1.47e-71 283 100 100 Zotu325 gi|405779046|gb|JX242931.1| 876639 Gymnothorax saxicola Gymnothorax saxicola Eukaryota 96.450 169 169 738 6 0 0 1 169 193 361 Gymnothorax saxicola 12S ribosomal RNA subunit gene, partial sequence; mitochondrial 1.80e-70 279 100 100