metagenlab / mummer2circos

Circular bacterial genome plots based on BLAST or NUCMER/PROMER alignments
MIT License
95 stars 24 forks source link

There is an error when I use -gb option. #15

Closed fsysy closed 9 months ago

fsysy commented 9 months ago

Thank you for enabling me to use your fantastic tool. However, there is an issue causing an error when I use the '-gb' option. I've tried installing it with conda and singularity, but I keep encountering the same error, even I used the example dataset and command on the tutorial. The error message is as follows. The version is 1.4.2 Can you provide a solution?

mummer2circos -l -r genomes/NZ_CP008827.fna -q genomes/*.fna -gb GCF_000281535_merged.gbk -b VF.faa

Traceback (most recent call last): File "/opt/conda/bin/mummer2circos", line 10, in sys.exit(main()) File "/opt/conda/lib/python3.7/site-packages/mummer2circos/init.py", line 68, in main force_data_dir=args.force) File "/opt/conda/lib/python3.7/site-packages/mummer2circos/mummer2circos.py", line 118, in init minus_file, plus_file = self.gbk2circos_data(gbk2orf, minus_file=f"{self.circos_data_dir}/circos_orf_minus.txt", plus_file=f"{self.circos_data_dir}/circos_orf_plus.txt") File "/opt/conda/lib/python3.7/site-packages/mummer2circos/mummer2circos.py", line 1184, in gbk2circos_data start = str(feature.location.start + self.contigs_add[record.id][0]) KeyError: 'NZ_CP008828.1'

tpillone commented 9 months ago

Hi @fsysy The error is due to the fact that the record ids in the genbank file have version numbers (e.g 'NZ_CP008828.1') while the record ids in the fasta file don't (e.g 'NZ_CP008828' without the '.1'). To map records between the fasta and the gbk files, we need to use the same ids in both files.

You can fix this by adding version numbers to the fasta file, either manually or with sed: sed -ri 's/^>(.*)/>\1.1/' genomes/NZ_CP008827.fna

Thank you for reporting this problem, I will fix the example dataset.

fsysy commented 9 months ago

Great!

I was confused about whether I should match the VERSION or the LOCUS in the GenBank file.

Anyway, that issue has been resolved with your solution.

Thank you @tpillone