Open taltman opened 2 years ago
What was the issue exactly? The protein_id values? If so there's a --noprotid
option that will get rid of them. If it's not that let me know what the problem is, there may be a way around it.
Ah, I see from the title of the issue the problem is that they are nested. Can you send me the .minfo
file used with v-annotate.pl
?
Hi @nawrockie , I'm using the pan-Coronavirus model,
version 1.3:
Please let me know if I misunderstood what you were asking for. Thanks!
It looks like the best matching model for your sequence must be the NC_006577 model because that is the only model with a N2
gene. The NC_006577 RefSeq has N2 nested within N as shown in the .minfo file, so that's why vadr is annotating it in your sequence:
FEATURE NC_006577 type:"gene" coords:"28320..29645:+" parent_idx_str:"GBNULL" gene:"N"
FEATURE NC_006577 type:"CDS" coords:"28320..29645:+" parent_idx_str:"GBNULL" gene:"N" product:"nucleocapsid phosphoprotein"
FEATURE NC_006577 type:"gene" coords:"28342..28959:+" parent_idx_str:"GBNULL" gene:"N2"
FEATURE NC_006577 type:"CDS" coords:"28342..28959:+" parent_idx_str:"GBNULL" gene:"N2" product:"nucleocapsid phosphoprotein 2"
If nested CDS and gene features are not allowed by ENA for submission purposes, you can just remove the N2 annotations manually from your .tbl file, or you can make a new .minfo file for vadr that has N2 removed and use that to redo the annotation, whichever is easier.
Let me know if that addresses your question or not.
This seemed to anger the validation guards at ENA:
Is this desired behavior by VADR?