Closed CarolineSchouNielsen closed 2 weeks ago
Since you are using a collection of 13 genomes for extracting the HMM sequences to make the tree, in theory your tree file should have 13 items in it, unless some of the genomes dropped out while filtering. You can request genomes to drop out explicitly using the --max-num-genes-missing-from-bin
parameter, but here it seems to have happened without that request (and without telling you about it).
I tested this part of the tutorial on my own computer with anvi'o v8, and I was able to reproduce the error. The output of anvi-get-sequences-for-hmm-hits
contains only sequences for 11 genomes in the collection:
$ grep -c '>' seqs-for-phylogenomics.fa
11
So two of the bins are dropping out. Cross-checking the list of bin names in the default collection with those in the sequence headers of seqs-for-phylogenomics.fa
reveals that P_acnes
and S_lugdunensis
are missing from the latter.
Then I tried the same thing using anvio-dev
. And this time, I got a nice warning output from the anvi-get-sequences-for-hmm-hits
program:
Num bins that lacked the 6 genes in `--gene-names` : 2
Bins that are no more in the analysis ........: P_acnes, S_lugdunensis
Filtered hits ................................: 64 hits remain after filtering for 6 gene(s)
WARNING
===============================================
You requested only the best hits to be reported, which means, if, say, there are
more than one RecA hits in a bin for a given HMM source, only the one with the
lowest e-value will be kept, and others will be removed from your final results.
Filtered hits ................................: 56 hits remain after removing weak hits for multiple genes
CONDOLENCES FROM DAAN TO YOU FOR YOUR BINS THAT WENT 💀
===============================================
Please note that a total of 2 bins from your final results were removed while
anvi'o was applying all the filters to your HMM hits. Please carefully review
the logs above to make sure you have a good grasp on what happened, and you're
happy with the reults.
So the same thing is happening there, but at least in the development version we are told about what is going on :) Those two bins don't have any of the 6 genes that we asked for, so rather than representing them in the output file with sequences exclusively made up of gap characters, anvi'o removed them completely. Thus, the tree doesn't contain the 2 genomes, and trying to visualize your collection of 13 genomes with a tree of only 11 genomes fails.
I think the behavior of this program is reasonable, so it is not a bug. Rather, the Infant Gut Tutorial should be updated to reflect what is happening here. In fact, if you look carefully at the screenshots of the tree in the IGT, it has only 11 genomes in it.
To get around the error and visualize that tree as shown in the IGT, you need to remove from the collection the two bins that are missing from the tree:
anvi-export-collection -p PROFILE.db -C default
grep -v P_acnes collection-default.txt | grep -v S_lugdunensis > collection-default-reduced.txt
anvi-import-collection -c CONTIGS.db -p PROFILE.db -C default_reduced collection-default-reduced.txt
And then you can visualize the tree with the reduced set of genomes:
anvi-interactive -p PROFILE.db \
-c CONTIGS.db \
-C default_reduced \
--tree phylogenomic-tree.txt
Voila:
I will update the IGT to include these steps. Thanks for bringing this to our attention, @CarolineSchouNielsen.
Tutorial has been updated :)
Thank you @ivagljiva! It works for me too :)
Phylogenomic-tree.txt and default collection has different number of items
Anvi'o version
`Anvi'o .......................................: marie (v8) Python .......................................: 3.10.14
Profile database .............................: 38 Contigs database .............................: 21 Pan database .................................: 16 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 4 tRNA-seq database ............................: 2`
System info
I installed Anvi'o with the following commands (probably a lot more information than needed, but I am unsure of what to write as I did not make these installation instructions myself):
Detailed description of the issue
Hi!
I was following your A primer on Anvi'o tutorial (with the infant gut data), but at Chapter 3: phylogenomics, I am having trouble visualising the phylogenomic tree with the additional information set in the default collection.
I made the default collection with the following command:
anvi-get-sequences-for-hmm-hits -c CONTIGS.db \ -p PROFILE.db \ -o seqs-for-phylogenomics.fa \ --hmm-source Bacteria_71 \ -C default \ --gene-names Ribosomal_L1,Ribosomal_L2,Ribosomal_L3,Ribosomal_L4,Ribosomal_L5,Ribosomal_L6 \ --concatenate-genes \ --return-best-hit \ --get-aa-sequences
Which gave this output:
Then, I try to visualise the tree with this command:
anvi-interactive -p PROFILE.db \ -c CONTIGS.db \ -C default \ --tree phylogenomic-tree.txt
But instead of getting the tree, I get this error message, saying that the number of items in my collection and tree have to match:
How and which items do I remove to make the tree?
Thank you! :)