Closed voidbag closed 4 months ago
@voidbag : Apologies for my slow response. The reason for the failure is that there is a translation exception in the genbank record, and an alternative start codon is used for the UL30A gene's CDS. Relevant excerpt from the GenBank annotation:
CDS complement(37894..38133)
/gene="UL30A"
/exception="alternative start codon"
/transl_except=(pos:complement(38131..38133),aa:Met)
The v-build.pl program validates all CDS sequences by checking that a corresponding ORF exists in the genome sequence, but it does not allow for alternative start codons like this.
The way around this is to modify the genome sequence to include a valid start, and supply that modified sequence as input to the v-build.pl program using the --infa
option. This model will still work fine for annotation. I'm attaching a fasta file called (NC_006273.38132GtoA.fa.txt) which includes the NC_006273
sequence with a single nucleotide change of position 38132 from a G
to an A
.
You can use this with v-build.pl
using the following command:
$VDIR/v-build.pl -f --skipbuild --infa NC_006273.38132GtoA.fa.txt --forcelong NC_006273 NC_006273
I recommend using the --skipbuild
option also, which will prevent a CM file from being built. The NC_006273 genome, which is more than 200Kb, is too long to use a CM for anyway. When you go to annotate sequences using this model with v-annotate.pl
I recommend you follow the instructions I've written for monkeypox virus (which is about
200Kb) here:
I am building models for herpes virus(NC_006273)
I suspect the contents in tbl file(NC_006273.vadr.tbl).
Could anyone help this issue?
For better handling, I uploaded .fa and .tbl files.
NC_006273.vadr.fa.txt NC_006273.vadr.tbl.txt