merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
443 stars 145 forks source link

[BUG] anvi-pan-genome - diamond BUG #2331

Open pcampiteli opened 3 months ago

pcampiteli commented 3 months ago

Short description of the problem

Hello everyone, I'm trying to run a anvi-pan-genome command into a eukaryotic genomes storage. And a diamond related error occur preventing the analysis to finishes correctly.

anvi'o version

both anvio-dev and anvio v.8 the same issue happens anvi-self-test --version Python .......................................: 3.10.6

Profile database .............................: 40 Contigs database .............................: 23 Pan database .................................: 18 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 4 tRNA-seq database ............................: 2

System info

conda installed anvio in a linux system

Detailed description of the issue

As said I'm trying to run the anvio-pan-genome into a eukaryotic genomes storage. Prior to this I had issues with the gene calls file, but you guys resolved it. Then I'm trying to create the pangenome analysis. In the diamond step, the analysis suddenly stops. I'll paste the step-by-step anvio screen information

unctions found ..............................: COG20_CATEGORY, COG20_FUNCTION, Pfam, KEGG_BRITE, KOfam, COG20_PATHWAY, CAZyme, KEGG_Class, KEGG_Module Genomes storage ..............................: Initialized (storage hash: hashc45516b2) Num genomes in storage .......................: 36 Num genomes will be used .....................: 36 Pan database .................................: A new database, /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/Trichoderma_PANGENOME_FINAL-PAN.db, has been created. Exclude partial gene calls ...................: False

AA sequences FASTA ...........................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa

Num AA sequences reported ....................: 370,554 Num excluded gene calls ......................: 0 Unique AA sequences FASTA ....................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa.unique

DIAMOND MAKEDB

Diamond search DB ............................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa.unique.dmnd

DIAMOND BLASTP

Additional params for blastp .................: --masking 0 Search results ...............................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.txt

DIAMOND VIEW

Config Error: Pfft. Something probably went wrong with Diamond's 'view' since one of the expected output files are missing. Please check the log file here: '/storage4/h. paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/log.txt' . IT IS VERY LIKELY to get these kinds of errors if the version of DIAMOND installed on your system differs from the one you had used to first setup your databases. Some errors may disappear if you were to setup your search databases

the log file info

DATE: 15 Aug 24 08:48:26

CMD LINE: diamond view -a /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.daa -o /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.txt -p 10 --outfmt 6

diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 10

Loading subject IDs... Error opening file /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.daa: No such file or directory

It seems the diamond search results is not created and the analysis stops.

Files / commands to reproduce the issue

Command: anvi-pan-genome -g "/storage4/h.paulocampiteli/pangenome/anvio/genomes_storage/trichoderma_PANGENOME_GENOMES.db" --additional-params-for-seq-search "--masking 0 --sensitive" --minbit 0.2 --min-percent-identity 20 --min-occurrence 2 -n Trichoderma_PANGENOME_FINAL -o Trichoderma_PANGENOME_FINAL.db -T 10 --enforce-hierarchical-clustering

files to reproduce: https://drive.google.com/drive/folders/1LfDF1qVWTFo4IQjR-icIywnSiwbMRqTJ?usp=drive_link

In the folder there is the external genomes file and the genomes-storage file to run the pangenomics.

If anything else is needed to resolve this matter, please feel free to ask. I'm very eager to get this resolved so I can finish this analysis, which is crucial to my current thesis. Thanks in advance!!