Closed NonAggressiveHail closed 1 month ago
Hi @NonAggressiveHail ,
thanks for reporting. However, this is not a bug, but just a discrepancy between the tagged version you're using (v1.9.4
) and the active main branch, where I just added the causing commit 2h ago ;-) https://github.com/oschwengers/bakta/commit/650eedc17e4814c15dad604487e8c88aab72fad4
An increment of 5
was the default up to v1.9.4
but will be changed with the upcoming v1.10.0
.
If you need the doc of your version, than please have a look at the related release https://github.com/oschwengers/bakta/tree/v1.9.4
Since this is not a bug, I'll close this for now. Just in case you have any further questions, please do not hesitate to reach out and maybe re-open this, again.
When running with default options, locus tags increment in counts of 5 when they should increment in counts of 1.
Commands run:
fasta_name=Pa_DK1_substr_NH57388A_6643
bakta --output ./${fasta_name} --prefix ${fasta_name} --proteins ../../raw_data/genomes/Pa_PAO1_107_annotations_with_sip_aes.gbk --force --complete --gram - --keep-contig-headers --locus-tag ${fasta_name##*_} --threads 12 ../../data/oriented_genomes/Pa_DK1_substr_NH57388A_6643/Pa_DK1_substr_NH57388A_6643_reoriented.fasta --debug
Debug output: Bakta v1.9.4 Options and arguments: input: /shared/home/jgh8/20230208_UKent/20240520_siderophore_prediction/data/oriented_genomes/Pa_DK1_substr_NH57388A_6643/Pa_DK1_substr_NH57388A_6643_reoriented.fasta db: /shared/home/jgh8/20230208_UKent/20240520_siderophore_prediction/programs/bakta/db, version 5.1, full user proteins: /shared/home/jgh8/20230208_UKent/20230307_ps_phylogenetics/raw_data/genomes/Pa_PAO1_107_annotations_with_sip_aes.gbk output: /shared/home/jgh8/20230208_UKent/20240520_siderophore_prediction/data/bakta_troubleshooting/Pa_DK1_substr_NH57388A_6643 force: True tmp directory: /tmp/tmpgccs79c2 prefix: Pa_DK1_substr_NH57388A_6643 threads: 12 debug: True translation table: 11 gram: - locus tag prefix: 6643 complete replicons: True keep contig headers: True
Bakta runs in DEBUG mode! Temporary data will not be destroyed at: /tmp/tmpgccs79c2
parse genome sequences... imported: 1 filtered & revised: 1 chromosomes: 1
start annotation... predict tRNAs... found: 64 predict tmRNAs... found: 1 predict rRNAs... found: 12 predict ncRNAs... found: 49 predict ncRNA regions... found: 30 predict CRISPR arrays... found: 4 predict & annotate CDSs... predicted: 5645 discarded spurious: 3 revised translational exceptions: 1 detected IPSs: 5500 found PSCs: 123 found PSCCs: 10 lookup annotations... conduct expert systems... amrfinder: 8 protein sequences: 605 user protein sequences: 5240 signal peptides: 673 combine annotations and mark hypotheticals... detect pseudogenes... pseudogene candidates: 21 found pseudogenes: 4 analyze hypothetical proteins: 67 detected Pfam hits: 1 calculated proteins statistics revise special cases... extract sORF... potential: 35196 discarded due to overlaps: 28753 discarded spurious: 0 detected IPSs: 1 found PSCs: 0 lookup annotations... filter and combine annotations... filtered sORFs: 1 signal peptides: 0 detect gaps... found: 0 detect oriCs/oriVs... found: 1 detect oriTs... found: 0 apply feature overlap filters... select features and create locus tags... selected: 5800 improve annotations... revised gene symbols: 105
genome statistics: Genome size: 6,212,531 bp Contigs/replicons: 1 GC: 66.6 % N50: 6,212,531 N ratio: 0.0 % coding density: 90.4 %
annotation summary: tRNAs: 63 tmRNAs: 1 rRNAs: 12 ncRNAs: 49 ncRNA regions: 30 CRISPR arrays: 4 CDSs: 5639 hypotheticals: 66 pseudogenes: 4 signal peptides: 673 sORFs: 1 gaps: 0 oriCs/oriVs: 1 oriTs: 0
export annotation results to: /shared/home/jgh8/20230208_UKent/20240520_siderophore_prediction/data/bakta_troubleshooting/Pa_DK1_substr_NH57388A_6643 human readable TSV... GFF3... INSDC GenBank & EMBL... /shared/home/jgh8/miniconda3/envs/bakta/lib/python3.8/site-packages/Bio/SeqIO/InsdcIO.py:727: BiopythonWarning: Increasing length of locus line to allow long name. This will result in fields that are not in usual positions. warnings.warn( genome sequences... feature nucleotide sequences... translated CDS sequences... circular genome plot... hypothetical TSV... translated hypothetical CDS sequences... machine readable JSON... genome and annotation summary...
If you use these results please cite Bakta: https://doi.org/10.1099/mgen.0.000685 Annotation successfully finished in 9:30 [mm:ss].
In the output tsv file the first two loci are:
#Sequence Id Type Start Stop Strand Locus Tag
NZ_LN870292.1|chromosome cds 1 1545 + 6643_00005
NZ_LN870292.1|chromosome cds 1574 2677 + 6643_00010
Clearly the incrementation is 5, this conflicts with the manual page which says that the default incrementation should be 1. It also says that this can be changed:
--locus-tag LOCUS_TAG
Locus tag prefix (default = autogenerated)
--locus-tag-increment {1,5,10}
Locus tag increment: 1/5/10 (default = 1)
--keep-contig-headers
Keep original contig headers
However this option is not present when
bakta --help
is run:--locus-tag LOCUS_TAG
Locus tag prefix (default = autogenerated)
--keep-contig-headers
Keep original contig headers
And trying to run regardless returns an error:
bakta: error: unrecognized arguments: --locus-tag-increment
Bakta was installed from conda