oschwengers / bakta

Rapid & standardized annotation of bacterial genomes, MAGs & plasmids
GNU General Public License v3.0
452 stars 55 forks source link

Circos plot error #174

Closed Rridley7 closed 1 year ago

Rridley7 commented 2 years ago

Hi, thanks for the great work on this tool, I have already found it very useful! I have run into an error with the new bakta plot feature, after calling bakta on a genome using default settings, or calling bakta_plot on a previously made json file.

Bakta was installed via mamba (conda) with mamba install bakta

For the case of calling bakta_plot: The input command: bakta_plot S12_1a9859_mtb_spa_t.1.json

Returns:

draw circular genome plot (type=features) containing all sequences...
Traceback (most recent call last):
  File "/opt/anaconda3/envs/FA_test/bin/bakta_plot", line 10, in <module>
    sys.exit(main())
  File "/opt/anaconda3/envs/FA_test/lib/python3.10/site-packages/bakta/plot.py", line 178, in main
    write_plot(features, contigs, output_path, colors, plot_type=plot_type)
  File "/opt/anaconda3/envs/FA_test/lib/python3.10/site-packages/bakta/plot.py", line 255, in write_plot
    raise Exception(f'circos error! error code: {proc.returncode}')
Exception: circos error! error code: 255

When run with debug flag (not sure if this is the same error):

bakta_plot --debug S12_1a9859_mtb_spa_t.1.json

Bakta v1.6.0
Options and arguments:
    input: /Users/rodney/Data/Coding_Programs/scratch_folder/S12_1a9859_mtb_spa_t.1.json
Traceback (most recent call last):
  File "/opt/anaconda3/envs/FA_test/bin/bakta_plot", line 10, in <module>
    sys.exit(main())
  File "/opt/anaconda3/envs/FA_test/lib/python3.10/site-packages/bakta/plot.py", line 150, in main
    print(f'\tconfig: {config_path}')
UnboundLocalError: local variable 'config_path' referenced before assignment

When run on a full genome: Command: bakta --db /storage/home/hcoda1/6/rridley3/shared3/DB/bakta/db --debug S10_4d3374_mtb_idb_t.15.fa The output of this is attached, however the error is the same message as the first.

bakta_debug.txt

oschwengers commented 2 years ago

Hi @Rridley7 , thanks for the report. The bug in bakta_plot was a simple unbound variable. It's fixed in https://github.com/oschwengers/bakta/commit/0ad59de1dbd4622179e51c0694d780a3324434b0 and will be available in the next upcoming 1.6.1 patch release. Until then: without --verbose or --debug it does not occur.

Regarding the first initial bug. It seems like this is related to Circos. To further debug this, could you provide either the Circos logs that are stored in /tmp/tmp2gi_5r6_ or the genome itself?

Rridley7 commented 2 years ago

Genome file is attached, thanks! S10_4d3374_mtb_idb_t.15.fa.zip

Proelmocan23 commented 1 year ago

Hi,

I am having a difficult time understanding the output of the circular genome. Is there a legend or manual I may read up on to understand what is being plotted?

Although not visualized here, but what does the third circle mean when you run the COG command? I understand that the extra features are features not present in forward or reverse strand. How is this possible?

GCF_000020025 1_ASM2002v1_genomic

oschwengers commented 1 year ago

@Rridley7 The cause for this is the default value (200) of the Circos max_ideograms setting preventing it from creating too-crowded figures. Therefore, it fails on genomes having more than 200 contigs.

Unfortunately, I wasn't aware of this, since I only tested it on complete and "better" draft genomes. For now, please use the --skip-plot option to skip this step. I'll come up with a solution and patch version soon.

oschwengers commented 1 year ago

@Proelmocan23 Fair point! I'll add a more-elaborated description to the readme, soon. Currently, there are two types of genome plots called featureand cog:

Feature: All features are plotted on the two outer rings which represent the forward and reverse strand: coding genes grey, non-coding features in color. The green/red circle represents the GC content per sliding window over the entire sequence(s) with green and red representing GC above and below average, respectively. The yellow/blue most inner circle represents the GC skew - a common plot providing some hints on the replicon replication bubble and hence, on the completeness and correctness of the assembly. On a complete bacterial genome, you normally see two inflection points at the origin of replication and the opposite point on the chromosome -> Wikipedia

COG: All protein-coding genes (CDS) are colored due to COG functional categories. To better distinguish the colored non-coding genes, they are plotted on an additional distinct inner ring. GC content and GC skew follow as described above.

oschwengers commented 1 year ago

I've added a plot description to the readme. Since all requests/issues are handled, I'll close this issue.