Closed oschwengers closed 2 years ago
A first version of the direct batch annotation of protein sequences is implemented. It might take a couple of weeks until the next release. If someone likes to give it a try in advance:
Installation:
git clone https://github.com/oschwengers/bakta.git
cd bakta
git checkout batch
python -m pip install --no-deps --ignore-installed .
Example:
$ bakta_batch --db <db-path> input.fasta
$ bakta_batch --db <db-path> --prefix test --output test --proteins special.faa --threads 8 input.fasta
Output:
<prefix>.tsv
: full annotation results<prefix>.hypotheticals.tsv
: additional info on hypotheticals (mol weight, iso el. point, Pfam hits)<prefix>.faa
: annotated protein sequencesHi, I am running the command outlined above and I am getting the following error when running:
/Users/cmeehan/Tools/bakta/bin/bakta_batch: line 3: realpath: command not found
usage: dirname string [...]
/Users/cmeehan/Tools/bakta/bin/bakta_batch: line 4: realpath: command not found
annotate protein sequences...
detected IPSs: 0
PSC failed! diamond-error-code=1
Traceback (most recent call last):
File "/Users/cmeehan/opt/miniconda3/envs/bakta/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/cmeehan/opt/miniconda3/envs/bakta/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/cmeehan/opt/miniconda3/envs/bakta/lib/python3.10/site-packages/bakta/batch.py", line 180, in
Any ideas?
Cheers, Conor
Hi Conor,
thanks for reporting this. I think there are several things going wrong here. I added more checks and loggings from the bakta main app to the batch commando (https://github.com/oschwengers/bakta/commit/0ba32feeec58cc36948f75be513f66fab04f74cd). Could you please pull the latest commit, re-install bakta as suggested before and provide the error message from Diamond. stdout
and stderr
of Diamond should now be logged in an additional <prefix>.log
file.
Let's add a bulk annotation feature for protein sequences. Just like
bakta_db
we could add an entry point to provide a dedicated interface.Entry point:
bakta_batch
Parameters:<input>
as a metavar--db <db-path>
--output <output>
--prefix <prefix>
--proteins <user-proteins>
--tmp-dir <tmp-dir>
--threads <threads>
Output: a simple
TSV
could probably work with the following columns:Suggested by @conmeehan on https://microbial-bioinfo.slack.com