weisberglab / beav

BEAV: Bacterial Element Annotation reVamped
GNU General Public License v3.0
39 stars 4 forks source link

Error in DefenseFinder #1

Open mmolari opened 5 months ago

mmolari commented 5 months ago

Hi!

First of all, thank you for putting together such a nice pipeline! It's very convenient to have all of these tools in one place.

I installed BEAV using conda as per instruction on the readme. I downloaded the light version of the database, and then ran BEAV with the command:

beav \
    --input ~/ownCloud/neherlab/code/pangenome-evo/data/fa/NZ_CP124487.1.fa \
    --output test \
    --threads 4 \
    --skip_tiger \
    --skip_gapmind \
    --skip_dbscan-swa \
    --skip_antismash \
    --bakta_arguments '--db ~/miniconda3/envs/beav/db/db-light' \

The first issue I encountered is with DefenseFinder. From the BEAV log file:

Identifying defense systems (DefenseFinder)

Error: error occurred while running DefenseFinder. Please see defensefinder.log
Elapsed: 0hrs 0min 1sec
cut: ./NZ_CP124487.1.fa_defense_finder_genes.tsv: No such file or directory
Here is the DefenseFinder.log ``` 2024-01-23 11:00:41 | INFO  | Received file ./bakta/NZ_CP124487.1.fa.faa 2024-01-23 11:00:41 | WARNING  | Out directory /home/marco/ownCloud/neherlab/code/pangenome-evo/exploration/2401c_beav/test/NZ_CP124487.1.fa already exists. Existing DefenseFinder output will be overwritten 2024-01-23 11:00:41 | INFO  | Running DefenseFinder Traceback (most recent call last): File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/macsypy/profile.py", line 70, in get_profile path = model_location.get_profile(gene.name) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/macsypy/registries.py", line 344, in get_profile return self._profiles[name] KeyError: 'Rst_Hydrolase-Tm__Hydrolase-Tm' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/marco/miniconda3/envs/beav/bin/defense-finder", line 10, in sys.exit(cli()) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/defense_finder_cli/main.py", line 143, in run defense_finder.run(protein_file_name, dbtype, workers, coverage, tmp_dir, models_dir, no_cut_ga, loglevel) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/defense_finder/__init__.py", line 29, in run macsyfinder.main(args=msf_cmd) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/macsypy/scripts/macsyfinder.py", line 1193, in main all_systems, rejected_candidates = search_systems(config, model_registry, models_def_to_detect, logger) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/macsypy/scripts/macsyfinder.py", line 529, in search_systems parser.parse(models_def_to_detect) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/macsypy/definition_parser.py", line 85, in parse self._fill_gene_bank(model_node, model_location, def_loc) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/macsypy/definition_parser.py", line 287, in _fill_gene_bank self.gene_bank.add_new_gene(model_location, gene_name, self.profile_factory) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/macsypy/gene.py", line 102, in add_new_gene gene = CoreGene(model_location, name, profile_factory) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/macsypy/gene.py", line 114, in __init__ self._profile = profile_factory.get_profile(self, model_location) File "/home/marco/miniconda3/envs/beav/lib/python3.9/site-packages/macsypy/profile.py", line 72, in get_profile raise MacsypyError(f"'{model_location.name}/{gene.name}': No such profile") macsypy.error.MacsypyError: 'defense-finder-models/Rst_Hydrolase-Tm__Hydrolase-Tm': No such profile ```

I believe this is a known issue and was already raised here. I just wanted to bring it up here as well so that once it is fixed you can update the version as well, or have a temporary fix in the meantime.

For completeness here is the full output of the BEAV command ``` BEAV version 1.0.0 --input /home/marco/ownCloud/neherlab/code/pangenome-evo/data/fa/NZ_CP124487.1.fa --output test --threads 4 --skip_tiger --skip_gapmind --skip_dbscan-swa --skip_antismash --bakta_arguments --db /home/marco/miniconda3/envs/beav/db/db-light Checking prerequisites: ---------------------------------------------------------- Bakta: OK antiSMASH: skipped MacSyFinder: OK IntegronFinder: OK DefenseFinder: OK TIGER2: skipped GapMind: skipped DBSCAN-SWA: skipped ---------------------------------------------------------- Running Bakta Elapsed: 0hrs 8min 53sec Done ---------------------------------------------------------- Annotation of other sequence elements cut: ./borders/NZ_CP124487.1.fa.virbox: No such file or directory cut: ./borders/NZ_CP124487.1.fa.trabox: No such file or directory Elapsed: 0hrs 0min 0sec Done ---------------------------------------------------------- Indentifying oriT Elapsed: 0hrs 0min 5sec Done ---------------------------------------------------------- Identifying secretion systems (MacSyFinder) Elapsed: 0hrs 0min 5sec Done ---------------------------------------------------------- Identifying integrons (IntegronFinder) Elapsed: 0hrs 0min 14sec Done ---------------------------------------------------------- Identifying defense systems (DefenseFinder) Error: error occurred while running DefenseFinder. Please see defensefinder.log Elapsed: 0hrs 0min 1sec cut: ./NZ_CP124487.1.fa_defense_finder_genes.tsv: No such file or directory Done ---------------------------------------------------------- Identifying biosynthetic gene clusters (antiSMASH) Skipped ---------------------------------------------------------- Identifying phage (DBSCAN-SWA) Skipped ---------------------------------------------------------- Characterizing amino acid biosynthesis and small carbon metabolite catabolism (GapMind) Skipped ---------------------------------------------------------- Identifying integrative conjugative elements [ICEs] (TIGER2) Skipped ---------------------------------------------------------- Combining annotations and preparing final output files tee: NZ_CP124487.1.fa/logs/Beav.log: No such file or directory Elapsed: 0hrs 0min 46sec Final annotation output: NZ_CP124487.1.fa_final.gbk ---------------------------------------------------------- Creating Circos Map ls: cannot access 'test/NZ_CP124487.1.fa/*_final.gbk': No such file or directory cat: 'test/NZ_CP124487.1.fa/*oncogenic_plasmid_final.out.contiglist': No such file or directory python3 beav_circos.py --input usage: beav_circos.py [-h] --input INPUT [--contigs [CONTIGS ...]] [--plasmid PLASMID] beav_circos.py: error: argument --input/-i: expected one argument Elapsed: 0hrs 0min 1sec Done mv: cannot stat 'NZ_CP124487.1.fa.circos.png': No such file or directory mv: cannot stat 'NZ_CP124487.1.fa.circos.pdf': No such file or directory mv: cannot stat 'NZ_CP124487.1.fa.oncogenes.png': No such file or directory mv: cannot stat 'NZ_CP124487.1.fa.oncogenes.pdf': No such file or directory ---------------------------------------------------------- Summary of annotations Secretion_Systems Defense_Systems Phages Biosynthetic_gene_clusters ICEs Integrons /home/marco/miniconda3/envs/beav/bin/beav: line 1063: N/A: No such file or directory 6 N/A N/A N/A N/A 0 Small carbon catabolism pathways: Done ---------------------------------------------------------- The BEAV pipeline automates the use of a number of published software tools. If you use these results in a publication, please include the following in your methods section and cite: Jung J, Rahman A, Schiffer A, and Weisberg A. 2023. BEAV: a bacterial genome and mobile element annotation pipeline. https://github.com/weisberglab/beav grep: test/NZ_CP124487.1.fa/logs/bakta.log: No such file or directory Bakta version Schwengers O, Jelonek L, Dieckmann MA, et al. 2021. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb Genom 7: 000685. EMBOSS:fuzznuc EMBOSS: The European Molecular Biology Open Software Suite (2000) Rice,P. Longden,I. and Bleasby,A. Trends in Genetics 16, (6) pp276--277 head: cannot open 'test/NZ_CP124487.1.fa/MacSyFinder_TXSS/macsyfinder.log' for reading: No such file or directory MacSyFinder version Néron, Bertrand; Denise, Rémi; Coluzzi, Charles; Touchon, Marie; Rocha, Eduardo P.C.; Abby, SophieS 2023. MacSyFinder v2: Improved modelling and search engine to identify molecular systems igenomes. Peer Community Journal, Volume 3, article no. e28. DOI: 10.24072/pcjournal.250. DefenseFinder Tesson F., Hervé A. , Touchon M., d’Humières C., Cury J., Bernheim A. Systematic and quantitative view of the antiviral arsenal of prokaryotes bioRx grep: test/NZ_CP124487.1.fa/Integron_Finder/Results_Integron_Finder_NZ_CP124487.1.fa/integron_finder.out: No such file or directory IntegronFinder version Néron B, Littner E, Haudiquet M, et al. 2022. IntegronFinder 2.0: Identification and Analysis of Integrons across Bacteria, with a Focus on Antibiotic Resistance in Klebsiella. Microorganisms 10: 700. ```

Thanks again!

Marco

alexweisberg commented 5 months ago

Dear Marco, Thank you for bringing this to our attention. Yes, unfortunately it is a bug in MacSyFinder that was uncovered by new DefenseFinder models. Until they update MacSyFinder with the fix on conda, the DefenseFinder component of the pipeline won't work.

Until it is fixed, you could run Beav with --skip_defensefinder to skip running DefenseFinder and it should run the rest of the pipeline.

Alternatively, you could download the updated file from the MacSyFinder commit (https://github.com/gem-pasteur/macsyfinder/commit/27ee21ceb8e7100d9183b084356f791487aca4ad) and copy it into the corresponding folders in macsyfinder in your conda environment. You would only need to add in the registries.py file for it to work.

To do so, with your conda environment activated:

get your python version: python --version

Mine is python 3.9, so fill that in in the following cp commands:

wget https://github.com/gem-pasteur/macsyfinder/blob/27ee21ceb8e7100d9183b084356f791487aca4ad/macsypy/registries.py

cp registries.py $CONDA_PREFIX/lib/python3.9/site-packages/macsypy/

mmolari commented 5 months ago

Thank you for the quick answer!

alexweisberg commented 5 months ago

No problem! We have a new version coming soon that will fix the other bugs that appeared in your run log. Hopefully that will be up later this week.

From: Marco Molari @.> Date: Tuesday, January 23, 2024 at 10:13 AM To: weisberglab/beav @.> Cc: Alexandra Weisberg @.>, Comment @.> Subject: Re: [weisberglab/beav] Error in DefenseFinder (Issue #1)

Thank you for the quick answer!

— Reply to this email directly, view it on GitHubhttps://github.com/weisberglab/beav/issues/1#issuecomment-1906642638, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AC4UVIIIG4IXG7R2TNWBAV3YP74TLAVCNFSM6AAAAABCGXF3TOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBWGY2DENRTHA. You are receiving this because you commented.Message ID: @.***>