robotoD / GenoVi

GenoVi, an automated customizable circular genome visualizer for bacteria and archaea
Other
77 stars 11 forks source link

problem with NCBI genebank files #11

Open AlcaArctica opened 1 year ago

AlcaArctica commented 1 year ago

I encountered the following error when running genovi with a genebank file downloaded from NCBI (https://www.ncbi.nlm.nih.gov/nuccore/AL513382.1). I downloaded the files manually via send to > complete record > file > genebank as well as send to > complete record > file > genebank (full). With both files I get:

$ genovi -i CT18_genebank_full.gb -o test -te -cs strong -s complete --size
/home/Uelze/miniconda3/envs/genovi/lib/python3.7/site-packages/Bio/GenBank/Scanner.py:1219: BiopythonParserWarning: Premature end of file in sequence data
  "Premature end of file in sequence data", BiopythonParserWarning
'locus_tag'
Error when transforming gbk to faa.
test-temp/contig_1-test_bands.kar created succesfully.
Traceback (most recent call last):
  File "/home/Uelze/miniconda3/envs/genovi/bin/genovi", line 8, in <module>
    sys.exit(main())
  File "/home/Uelze/miniconda3/envs/genovi/lib/python3.7/site-packages/scripts/GenoVi.py", line 685, in main
    visualiseGenome(*get_args())
  File "/home/Uelze/miniconda3/envs/genovi/lib/python3.7/site-packages/scripts/GenoVi.py", line 341, in visualiseGenome
    sizes, cogs_p, cogs_n, lengths, chrms, hist, wanted_cogs = base(file, temp_folder + "/" + output_file_part, output_file + "/" + output_file, True, True, cogs_classified, cogs_classified, False, True, deepnog_confidence_threshold, verbose, wanted_cogs=wanted_cogs)
  File "/home/Uelze/miniconda3/envs/genovi/lib/python3.7/site-packages/scripts/create_raw.py", line 499, in base
    _ , _, chrms_tp, chrms_tn, _ = create_feature(gbk_file, tmp, output, sizes, "tRNA", verbose = verbose, complete=complete)
  File "/home/Uelze/miniconda3/envs/genovi/lib/python3.7/site-packages/scripts/create_raw.py", line 321, in create_feature
    locus_tag = feature.qualifiers.get("locus_tag")[0]
TypeError: 'NoneType' object is not subscriptable

I tried genovi with the test data of the github repository and it works just fine. I also found a workaround, by downloading the fasta file from NCBI, annotating it with bakta and then running genovi with the *.gbff file. This also works. So it seems to be a problem with the format of the NCBI genbank file?

vsaona commented 1 year ago

Indeed, that file has no "locus_tag" on any of its features and that crashes the program. Since the publication is from 2001 and PGAP (NCBI's annotation pipeline) was developed in 2001, I guess the file probably is older than the current standard and using some unkown annotation tool (This is just a guess, it should be investigated if this happens with other files and which ones).

In this case, the complete genome works fine, but it is from 2003, so it probably got annotated in a different way.

Thanks for sharing this problem!