Open FTouzain opened 5 days ago
@FTouzain : I'd be happy to look into this, but it would be helpful if you can please provide some more information. If you provide your vadr model file and the v-annotate.pl command you used, I should be able to reproduce your results and look into it further. Or if you tell me what commands you ran to build your model, I can build it myself.
@nawrockie Oh yes, I am sorry, I forgot the most important. I use your classical corona model (25/05/2022) with the command:
v-annotate.pl --mkey corona --mdir /db/vadr_db/vadr-models-corona/ --glsearch --split -f --cpu 4 consensus_on_KY933089.fasta out_dir_vadr/
(in the realease note of the model, it is written: vadr-models-55-1.0.2-dev-5: [Apr 2020]
for the model part)
I hope I do not forget info, otherwise tell me. Thank you.
@FTouzain : I was able to reproduce the annotation of nsp10 from 15032 to 16831 by vadr for KY993089.1. The alignment of NC_001451 (the reference model VADR uses), which has nsp10 at positions 15132..16931 (https://www.ncbi.nlm.nih.gov/nuccore/NC_001451) and KY993089 clearly shows that vadr's annotation from 15032 to 16831 is correct. This alignment will be output from v-annotate.pl if you use the '--out_stk' option. The alignment file name will end with '.NC_001451.align.stk'. I'm not sure why the alignment graph you provided the link for shows that nsp10 is between 11700 and 12000, but in KY993089.1 it is clearly 15032..16831 as vadr reports.
I'm attaching a screenshot of blastx output I get for NC_001451/15132..16931 (fasta below) and KY993089 showing that the vadr coords are correct.
Also, based on the model file you are using, I think that you must not be using the latest version of vadr (1.6.4). When I ran vadr 1.6.4 using the latest coronavirus model library v1.3.3 https://ftp.ncbi.nlm.nih.gov/pub/nawrocki/vadr-models/coronaviridae/CURRENT/vadr-models-corona-1.3-3.tar.gz I found that there were less alerts output from v-annotate.pl for KY993089 than what you reported. Based on this, I think that upgrading to the latest versions will give you better results.
>NC_001451.1/15132-16931
TCTTGTGGCGTTTGTGTAGTTTGTAATAGTCAAACTATACTACGCTGCGGTAATTGTATT
CGTAAACCGTTTTTGTGTTGTAAGTGTTGCTATGACCACGTCATGCATACGGACCACAAA
AATGTTTTATCTATAAATCCTTATATTTGCTCACAGCTAGGTTGCGGTGAAGCAGATGTT
ACTAAATTGTACCTCGGGGGTATGTCGTACTTCTGTGGTAATCATAAACCGAAATTGTCA
ATACCGTTAGTATCTAATGGTACTGTTTTTGGAATTTACAGGGCTAATTGTGCTGGTAGT
GAAAATGTTGATGATTTTAATCAACTAGCTACTACTAATTGGTCCATTGTCGAACCTTAT
ATTTTAGCAAATCGCTGTAGTGATTCATTGAGACGTTTTGCTGCAGAGACAGTAAAAGCC
ACAGAAGAATTACATAAGCAACAATTTGCTAGTGCAGAAGTGCGAGAAGTATTCTCAGAT
CGTGAATTGATTCTATCATGGGAACCAGGAAAAACCAGGCCGCCATTGAATAGAAATTAT
GTTTTCACAGGTTATCACTTTACAAGAACTAGTAAGGTGCAGCTTGGTGATTTTACATTT
GAAAAAGGTGAAGGTAAGGATGTTGTCTATTATAAAGCAACGTCTACTGCTAAATTGTCT
GTAGGAGACATTTTTGTTTTAACCTCACACAATGTTGTTTCTCTCGTAGCGCCAACATTG
TGTCCACAACAAACCTTTTCTAGGTTTGTAAATTTAAGACCTAATGTAATGGTACCTGAA
TGTTTTGTAAATAACATTCCACTTTACCATTTAGTAGGTAAACAGAAGCGTACTACAGTA
CAAGGTCCTCCTGGCAGTGGTAAATCCCACTTTGCTATAGGCCTTGCAGTATACTTTAGT
AGCGCTCGTGTTGTTTTTACTGCATGTTCTCATGCAGCTGTTGATGCTTTATGTGAAAAA
GCTTTTAAGTTTCTTAAAGTTGATGATTGCACTCGTATAGTACCCCAAAGGACTACTGTC
GATTGCTTCTCAAAATTTAAAGCTAATGACACAGGCAAAAAGTACATTTTTAGTACTATT
AATGCCTTGCCGGAAGTTAGTTGTGATATTCTTTTGGTTGACGAGGTTAGTATGTTGACC
AATTACGAATTGTCCTTTATTAATGGTAAGATAAATTACCAATATGTTGTGTATGTAGGT
GATCCGGCTCAATTACCGGCACCCCGCACTTTACTTAATGGTTCACTTTCTCCAAAGGAT
TATAATGTTGTCACAAACCTTATGGTTTGTGTTAAACCTGATATTTTCCTTGCAAAGTGT
TATCGTTGTCCTAAGGAAATTGTAGACACTGTGTCTACTCTTGTTTATGATGGAAAGTTT
ATTGCAAATAACCCAGAATCACGTGAGTGTTTCAAGGTTATAGTTAATAATGGCAATTCT
GATGTAGGACATGAAAGTGGTTCAGCCTACAACACAACACAATTGGAATTTGTGAAAGAC
TTTGTTTGTCGCAATAAACAATGGCGGGAAGCAATATTTATTTCACCTTACAATGCTATG
AACCAGAGAGCTTACCGTATGCTTGGACTTAATGTTCAAACAGTAGATTCTTCTCAAGGT
TCAGAGTATGATTATGTCATCTTCTGTGTTACTGCAGATTCGCAGCATGCACTGAATATT
AATAGATTTAATGTGGCGCTTACAAGAGCTAAGCGTGGTATACTAGTTGTCATGCGCCAG
CGTGATGAATTGTATTCTGCTCTTAAGTTTACAGAGCTAGATAGTGAAACAAGTCTGCAA
@FTouzain : actually the best way to see that the vadr coords for nsp10 are correct is to run blastx with query of KY933089, then click 'Align two or more sequences' and enter NP_740630.1 as the subject (NP_740630.1 is the nsp10 protein db entry from NC_001451).
blastx server: https://blast.ncbi.nlm.nih.gov/Blast.cgi?LINK_LOC=blasthome&PAGE_TYPE=BlastSearch&PROGRAM=blastx
Thank you for your work.
When I run vadr 1.6.3 on this virus: https://www.ncbi.nlm.nih.gov/nuccore/KY933089 (fasta file)
I obtain nothing in the 'pass' tsv file. In the 'fail' tsv file, I get this:
One of the problems is this:
It point the reference genome I use (KY933089.1), but at these positions, it is not the nsp10 pointed by vadr but the nsp13 (in my case, I focus on the position 15707) as viewed here: https://www.ncbi.nlm.nih.gov/nuccore/KY933089.1?report=graph (on this page, nsp10 is located near 11700 12000 positions)
How can vadr report similarities to a genome with coordinates that do not correspond to the reference genome that is mentioned please? (I need vadr predictions because I use sometimes assemblies, not ref genomes)
Thank you in advance Fabrice