rcs333 / VAPiD

VAPiD: Viral Annotation and Identification Pipeline
MIT License
50 stars 15 forks source link

Issues with final genbank-format file #17

Open charlesfoster opened 1 year ago

charlesfoster commented 1 year ago

Hi,

Thanks for the interesting tool. I've tried it out on a de novo assembly of an RSV isolate, but the resulting genbank-format file is incomplete with some minor errors.

Firstly, the file is missing the 'gene' features necessary for a genbank file, as well as some of the qualifiers for the CDS region ('/gene=...', '/product=...'):

image

Compare to the selected reference:

image

Secondly, my input sequence was reverse complemented with respect to the reference. In this circumstance, the genbank file should state that the feature occurs on the complementary strand:

image

If you are still working on this project and have time, could these issues please be fixed?

Cheers, Charles

charlesfoster commented 1 year ago

My bad re: the gene annotations, I just saw the --all option - somehow missed it previously. I note that the help option still says:

parser.add_argument('--all', action='store_true', help='Use this flag to transfer ALL annotations from reference, this is largely untested')

And the code says:

Experimental code for transferring 'gene' annotations from NCBI reference sequence

Was this ever tested further to consider it stable?

The 'complement' issue still stands though :)

rcs333 commented 1 year ago

Hi there! I do still work on this as time permits.

I tested the -all argument only with very good input files, so for input sequences that have the expected genes in the expected order it should work. I basically tested it by running sequences that had already been annotated in genbank and making sure that it copied it's own reference correctly. Not very rigorous but it should work.

I don't fully understand the complementing problem, I vaguely recall fixing that issue manually before but never implemented a robust fix.

Could you send me the input file and your reference so I can get a better understanding of what you need? I need to dust the old cobwebs off lol

charlesfoster commented 1 year ago

Thanks for the reply. I'll send it by email if that's okay. Should I use your gmail account from your Github profile?

rcs333 commented 1 year ago

Yep that email is great!

I also remembered why genes and other features are not annotated by default. The program is designed to output the bare minimum for genbank submission and also to pull unchecked references directly from genbank. Many of these references don't contain gene annotations so instead of checking if they exist or not and then moving the annotations I just chose to ignore them.