Open NonAggressiveHail opened 1 day ago
I am not sure if there is potentially also a bug at play here. For example I tried the following fasta file
>WP_003116930 aes~~~aes~~~CDD:400284
MALNPDIAAYLELVGNGRSSGKSLPMHQLTVQQAREQFDQSSALMDPGLDEPLARVETLFVPARDGTPLP
ARLYSPQGLSASPPLPGVLYLHGGGYVVGSLDSHDALCASLAERAGCVVLSLAYRLAPEWRFPTAAEDAE
DAWCWLAAEAARLGIDPQRLAVAGDSVGGSLCAVLSHRLALRGEASQPRLQVLIYPVTDASRTHQSIERY
AVGHLLEKDSLEWFYQHYQRSPEDRQDPRFSPLLGVVPADLAPTLLLVAECDPLHDEGIAYAEHLRQGGA
RVELCVYPGMTHDFLRMGAIVDEADDAKDMIADALVAALAT
And whilst running I do not get the same error, however I do get the following output:
predict & annotate CDSs...
predicted: 5682
discarded spurious: 0
revised translational exceptions: 0
detected IPSs: 5547
found PSCs: 128
found PSCCs: 4
lookup annotations...
conduct expert systems...
amrfinder: 8
protein sequences: 656
user protein sequences: 0
I am surprised to see user protein sequences 0
, when I would expect it to be 1
When providing an incorrectly formatted genbank or fasta file with the --proteins option no information is given on why the file in invalid.
I attempted to annotate a genome with these reference proteins, downloaded as GenBank file format. When bakta annotates CDS the error is given "ERROR: User proteins file GenBank format not valid!". Rerunning with --debug option I expected more information on how this file is invalid so I can repair it, but no further information is given. Ideally, more information should be given on what aspects of the file are invalid so they can be repaired.
I have also tried converting the Genbank file to a fasta file with prokka's prokka-genbank_to_fasta_db, however using this gave the error "ERROR: User proteins file Fasta format not valid!". Again it would be helpful for this to provide further information on why it is invalid, or for bakta to include a utility with similar functionality which works correctly.