Hello everyone, I'm trying to run a anvi-pan-genome command into a eukaryotic genomes storage. And a diamond related error occur preventing the analysis to finishes correctly.
anvi'o version
both anvio-dev and anvio v.8 the same issue happens
anvi-self-test --version
Python .......................................: 3.10.6
As said I'm trying to run the anvio-pan-genome into a eukaryotic genomes storage. Prior to this I had issues with the gene calls file, but you guys resolved it. Then I'm trying to create the pangenome analysis.
In the diamond step, the analysis suddenly stops. I'll paste the step-by-step anvio screen information
unctions found ..............................: COG20_CATEGORY, COG20_FUNCTION, Pfam, KEGG_BRITE, KOfam, COG20_PATHWAY, CAZyme, KEGG_Class, KEGG_Module
Genomes storage ..............................: Initialized (storage hash: hashc45516b2)
Num genomes in storage .......................: 36
Num genomes will be used .....................: 36
Pan database .................................: A new database,
/storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/Trichoderma_PANGENOME_FINAL-PAN.db, has
been created.
Exclude partial gene calls ...................: False
AA sequences FASTA ...........................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa
Num AA sequences reported ....................: 370,554
Num excluded gene calls ......................: 0
Unique AA sequences FASTA ....................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa.unique
DIAMOND MAKEDB
Diamond search DB ............................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa.unique.dmnd
Config Error: Pfft. Something probably went wrong with Diamond's 'view' since one of the
expected output files are missing. Please check the log file here: '/storage4/h.
paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/log.txt'
. IT IS VERY LIKELY to get these kinds of errors if the version of DIAMOND
installed on your system differs from the one you had used to first setup your
databases. Some errors may disappear if you were to setup your search databases
Loading subject IDs...
Error opening file /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.daa: No such file or directory
It seems the diamond search results is not created and the analysis stops.
In the folder there is the external genomes file and the genomes-storage file to run the pangenomics.
If anything else is needed to resolve this matter, please feel free to ask. I'm very eager to get this resolved so I can finish this analysis, which is crucial to my current thesis. Thanks in advance!!
Short description of the problem
Hello everyone, I'm trying to run a anvi-pan-genome command into a eukaryotic genomes storage. And a diamond related error occur preventing the analysis to finishes correctly.
anvi'o version
both anvio-dev and anvio v.8 the same issue happens anvi-self-test --version Python .......................................: 3.10.6
Profile database .............................: 40 Contigs database .............................: 23 Pan database .................................: 18 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 4 tRNA-seq database ............................: 2
System info
conda installed anvio in a linux system
Detailed description of the issue
As said I'm trying to run the anvio-pan-genome into a eukaryotic genomes storage. Prior to this I had issues with the gene calls file, but you guys resolved it. Then I'm trying to create the pangenome analysis. In the diamond step, the analysis suddenly stops. I'll paste the step-by-step anvio screen information
unctions found ..............................: COG20_CATEGORY, COG20_FUNCTION, Pfam, KEGG_BRITE, KOfam, COG20_PATHWAY, CAZyme, KEGG_Class, KEGG_Module Genomes storage ..............................: Initialized (storage hash: hashc45516b2) Num genomes in storage .......................: 36 Num genomes will be used .....................: 36 Pan database .................................: A new database, /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/Trichoderma_PANGENOME_FINAL-PAN.db, has been created. Exclude partial gene calls ...................: False
AA sequences FASTA ...........................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa
Num AA sequences reported ....................: 370,554 Num excluded gene calls ......................: 0 Unique AA sequences FASTA ....................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa.unique
DIAMOND MAKEDB
Diamond search DB ............................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/combined-aas.fa.unique.dmnd
DIAMOND BLASTP
Additional params for blastp .................: --masking 0 Search results ...............................: /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.txt
DIAMOND VIEW
Config Error: Pfft. Something probably went wrong with Diamond's 'view' since one of the expected output files are missing. Please check the log file here: '/storage4/h. paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/log.txt' . IT IS VERY LIKELY to get these kinds of errors if the version of DIAMOND installed on your system differs from the one you had used to first setup your databases. Some errors may disappear if you were to setup your search databases
the log file info
DATE: 15 Aug 24 08:48:26
CMD LINE: diamond view -a /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.daa -o /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.txt -p 10 --outfmt 6
diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)
CPU threads: 10
Loading subject IDs... Error opening file /storage4/h.paulocampiteli/pangenome/anvio/pan_final/Trichoderma_PANGENOME_FINAL.db/diamond-search-results.daa: No such file or directory
It seems the diamond search results is not created and the analysis stops.
Files / commands to reproduce the issue
Command: anvi-pan-genome -g "/storage4/h.paulocampiteli/pangenome/anvio/genomes_storage/trichoderma_PANGENOME_GENOMES.db" --additional-params-for-seq-search "--masking 0 --sensitive" --minbit 0.2 --min-percent-identity 20 --min-occurrence 2 -n Trichoderma_PANGENOME_FINAL -o Trichoderma_PANGENOME_FINAL.db -T 10 --enforce-hierarchical-clustering
files to reproduce: https://drive.google.com/drive/folders/1LfDF1qVWTFo4IQjR-icIywnSiwbMRqTJ?usp=drive_link
In the folder there is the external genomes file and the genomes-storage file to run the pangenomics.
If anything else is needed to resolve this matter, please feel free to ask. I'm very eager to get this resolved so I can finish this analysis, which is crucial to my current thesis. Thanks in advance!!