mtisza1 / Cenote-Taker3

Discover and annotate the virome
MIT License
29 stars 1 forks source link

Submission to genbank #10

Open DarrenObbard opened 3 months ago

DarrenObbard commented 3 months ago

Hi!

This is a set of dumb questions, not an issue - but that isn't a category!

I am new to version 3, and I would like to use it for annotating RNA viruses I have already curated the sequences for. I have fasta's, in the headers of which I can include all the metadata. It looks like cenote-taker3 will annotate CDSs on them and generate a sqn.

  1. Will GenBank actually accept these sqn files?
  2. How do I deal with 'known' viruses?
  3. How should I deal with multiple strains of the same virus (from different metagenomic pools) - the homologous genes need to have the same names and predicted boundaries

Thanks! Darren

mtisza1 commented 3 months ago

Hey Darren,

I hope you are doing well. Always happy to hear about questions from users!

  1. In general, yes, GenBank will take .sqn files from Cenote-Taker 2 or Cenote-Taker 3, assuming that the required metadata is provided. Occasionally, there is some feature that GenBank doesn't like from a cenote-taker output. It's possible, even likely, that their policies will change over the years. So, I may have to make changes to keep up. There are optional flags, e.g. --assembler and --template_file that you may need to use, and of course create a template file. Check their submission page, and look at the errors that get thrown in the .val files.
  2. Known viruses, for which there are experimental/historical annotations, are not the focus of this tool. You really want to use a tool to "lift over" annotations for extant GenBank records. NCBI has a tool called VADR here. There's another tool, VAPiD which purportedly does something similar here.
  3. I think you could give Cenote-Taker 3 a spin for this example (remember to use the flag -am T for annotation mode) and see what happens. The aforementioned tools may be better in these casees.

Best,

Mike