Closed uguyet closed 1 month ago
Dear @uguyet,
Sorry about this inconvenience and for your patience. I only had a chance to look at it today, and tried to reproduce it using these genomes.
Things worked well for me with the current anvio-dev as you can see here:
$ ~/Downloads/genomes >>> anvi-dereplicate-genomes --fasta-text fasta.txt --program pyANI --min-alignment-fraction 0.25 --similarity-threshold 0.98 -o output_pyANI/ -T 3
Run mode .....................................: pyANI
CITATION
===============================================
Anvi'o will use 'PyANI' by Pritchard et al. (DOI: 10.1039/C5AY02550H) to compute
ANI. If you publish your findings, please do not forget to properly credit their
work.
[PyANI] Num threads to use ...................: 3
[PyANI] Alignment method .....................: ANIb
[PyANI] Log file path ........................: /var/folders/_1/yvhyjg5j1wl09t0cx345j4hd87vf3t/T/tmpomsn7y8o
WARNING
===============================================
THIS IS VERY IMPORTANT! You asked anvi'o to remove any hits between two genomes
if they had a full percent identity less than '0.20'. Anvi'o found 4 such
instances between the pairwise comparisons of your 3 genomes, and is about to
set all ANI scores between these instances to 0. For instance, one of your
genomes, 'genome_01', had a full percentage identity of 0.029 relative to
'genome_03', another one of your genomes, which is below your threshold, and so
the ANI scores will be ignored (set to 0) for all downstream reports you will
find in anvi'o tables and visualizations. Anvi'o kindly invites you to carefully
think about potential implications of discarding hits based on an arbitrary
alignment fraction, but does not judge you because it is not perfect either.
WARNING
===============================================
THIS IS VERY IMPORTANT! You asked anvi'o to remove any hits between two genomes
if the hit was produced by a weak alignment (which you defined as alignment
fraction less than '0.25'). Anvi'o found 4 such instances between the pairwise
comparisons of your 3 genomes, and is about to set all ANI scores between these
instances to 0. For instance, one of your genomes, 'genome_01', was 0.708
identical to 'genome_03', another one of your genomes, but the aligned fraction
of genome_01 to genome_03 was only 0.041 and was below your threshold, and so
the ANI scores will be ignored (set to 0) for all downstream reports you will
find in anvi'o tables and visualizations. Anvi'o kindly invites you to carefully
think about potential implications of discarding hits based on an arbitrary
alignment fraction, but does not judge you because it is not perfect either.
pyANI similarity metric ......................: calculated
Number of genomes considered .................: 3
Number of redundant genomes ..................: 1
Final number of dereplicated genomes .........: 2
ANI RESULTS
===============================================
* Matrix and clustering of 'alignment coverage' written to output directory
* Matrix and clustering of 'alignment lengths' written to output directory
* Matrix and clustering of 'hadamard' written to output directory
* Matrix and clustering of 'percentage identity' written to output directory
* Matrix and clustering of 'similarity errors' written to output directory
* Matrix and clustering of 'full percentage identity' written to output directory
* Cleaning up the temp directory (you can use `--debug` if you would like to keep
it for testing purposes)
$ ~/Downloads/genomes >>> cat output_pyANI/CLUSTER_REPORT.txt
cluster size representative genomes
cluster_000001 1 genome_01 genome_01
cluster_000002 2 genome_03 genome_02,genome_03
I am wondering if this is an issue due to some Linux specific issue.
@ahenoch, @metehaansever, since I know you're using Linux -- can either of you please download these genomes and run the following commands to see if you get the same error @uguyet got?
tar -zxvf genomes.tar.gz
cd genomes/
anvi-dereplicate-genomes --fasta-text fasta.txt \
--program pyANI \
--min-alignment-fraction 0.25 \
--similarity-threshold 0.98 \
-o output_pyANI/ \
-T 3
Thank you!
Hi @meren I just ran it and it works successfully for me on my Ubuntu.
Thanks, @metehaansever.
My additional attempts to reproduce this failed here :( Closing it now with the hope that @uguyet will come back to us if the problem continues or still relevant.
Short description of the problem
anvi-dereplicate-genomes with pyANI has a bug when trying to create output files.
anvi'o version
v8-dev and v8
System info
OS: Ubuntu 22.04.4 LTS anvio was install using conda
Detailed description of the issue
I launched the following command:
anvi-dereplicate-genomes --fasta-text-file fasta_path_pyANI.tab --program pyANI --min-alignment-fraction 0.25 --similarity-threshold 0.98 -o output_pyANI/ -T 3
and got the following output: