Closed pedres closed 8 months ago
Could it be possible some of the MAGs in your collection didn't have any of these genes and was excluded from the MAGs-tree.txt?
That was the first error I had when importing the collection and run all the commands because a large amount of MAGS had not any of requested genes and the anvi-get-sequences-for-hmm-hits did not include them in its output. I fixed it importing only 67 MAGS and then extracting the aminoacid fasta and did the tree. Then I check that the collection and tree have the same MAGs and run anvi-interactive. In addition, I am using anvio to join MAGs obtained independently from several types of samples (three replicates per sample type). The process of MAG processing was shotgun + Hi-C + PhaseGenomics blackbox deconvolution for each sample. I have a contig file, MAGs and shotgun reads for each sample (biological replicate). My approach will be to join the three contig files (joined_contig.fa) and make a contig.db file, from which I will get single-copy core genes. Next I will map shotgun reads for each biological replicate to an index build with the joined_contig.fa and do three profiles, that will be merged. Once merged I will import MAGS as a collection. Finally I would follow the section of Tara tutorial of “Combining MAGs from...” Since MAGS have the same names across biological replicates (bin_1... bin_N) I renamed them addind sample name (sample1_bin_1). I did the same with contigs names (k141_1 to sample1_k141_1) to avoid conflicts when importing the MAG collection. It this a good approach? Or it would be better and easier to treat every sample as independent creating its own contig.db, profile.db and collection of MAGs) and then join them after refining MAGs following “Combining MAGs from...” Thanks a lot for your help and advice. Manuel
Hey @pedres, can you please update your anvi'o to v8
and try again? I just realized you're still on v7.1
. We're unable to support earlier versions of anvi'o as we don't have the human resources for that.
If you run into the same error, I will then carefully go through your files and try to help you -- thank you very much for your patience! :)
Ok,
Thanks a lot. I will try it Monday on another computer with anvio-8. In fact, I have to update my home computer since in the lab and in the computer facility I have installed anvio 8.
Regards,
Manuel
De: A. Murat Eren (Meren) @.> Enviado: sábado, 4 de noviembre de 2023 12:04 Para: merenlab/anvio @.> Cc: Manuel Aira Vieira @.>; Mention @.> Asunto: Re: [merenlab/anvio] [BUG] anvi-interactive crashes when using a collection and a external tree (Issue #2167)
If you run into the same error, I will then carefully go through your files and try to help you -- thank you very much for your patience! :)
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/merenlab/anvio/issues/2167*issuecomment-1793412967__;Iw!!D9dNQwwGXtA!VezeZNhdCWLbtAcjTz-I8kRJ3ym2gs0QwXU-q2gDMi5li3cdWZVel9bLjv1re8YMiseVmSgEH0UisiS-lv6P5w$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGJ25ZYKGRUPKSZJKYD5F5LYCYOKXAVCNFSM6AAAAAA64QMTCGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGQYTEOJWG4__;!!D9dNQwwGXtA!VezeZNhdCWLbtAcjTz-I8kRJ3ym2gs0QwXU-q2gDMi5li3cdWZVel9bLjv1re8YMiseVmSgEH0UisiReS5GIHw$. You are receiving this because you were mentioned.Message ID: @.***>
Hi, I have just tested the issue with anvio-8 (installed with mamba in an environment following the instructions of web), and it stills fails. Below are the commands used and the error message. I have also tried to run anvi-interactive in manual mode and also gives an error anvi-interactive -p test.db -f mags_amino.fa -t MAGS-tree.txt --manual-mode
Config Error: Some of the names in your view data does not have corresponding entries in the FASTA file you provided. Here is an example to one of those 64 names that occur in your data file, but not in the FASTA file: "bin_61"
However, the output of grep -o "bin_64" mags_amino.fa | wc -l is "1" as it is the output of grep -o "bin_64" MAGS-tree.txt | wc -l. The funny or curious thing is that if I run again the anvi-interactive command it gives the same error but with other bin, for example bin_23. Again, that bin is in the MAGS-tree.txt and the mags_amino.fa files. I have attached these two files because it seems that the problem is in there. Thanks again for your help.
https://drive.google.com/drive/folders/1-jJCDlBsGDSriKWfMNf0LQXTmhXQ73lq?usp=sharing
anvi-migrate --migrate-safely ss1-CONTIGS.db
anvi-import-collection -c ss1-CONTIGS.db \ -p ss1/PROFILE.db \ -C MAGS --contigs-mode MAGS.txt
anvi-get-sequences-for-hmm-hits -c ss1-CONTIGS.db \ -p ss1/PROFILE.db \ -o mags_amino.fa \ -C MAGS \ --hmm-source Bacteria_71 \ --gene-names Ribosomal_L1,Ribosomal_L2,Ribosomal_L3,Ribosomal_L4,Ribosomal_L5,Ribosomal_L6 \ --return-best-hit \ --get-aa-sequences \ --concatenate
grep -o ">" *.fa | wc -l ### to check that there were 64 bins in the aminoacid file
anvi-gen-phylogenomic-tree -f mags_amino.fa -o MAGS-tree.txt
anvi-interactive -p ss1/PROFILE.db -c ss1-CONTIGS.db -C MAGS -t MAGS-tree.txt
Anvi'o .......................................: marie (v8) Python .......................................: 3.10.12
Profile database .............................: 38 Contigs database .............................: 21 Pan database .................................: 16 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 4 tRNA-seq database ............................: 2
Traceback (most recent call last):
File "/media/fulgencio/DATOS/conda/envs/anvio-8/bin/anvi-interactive", line 122, in
Hey @pedres,
Sorry for the not-so-helpful error messages here. I think all your downstream issues is due to a very simple problem: the deflines in your FASTA file looks like this (because anvi'o reported them as such, it is not your fault):
>bin_34 num_genes:6|genes:Ribosomal_L1,Ribosomal_L2,Ribosomal_L3,Ribosomal_L4,Ribosomal_L5,Ribosomal_L6|separator:XXX
But every other program in anvi'o wants the deflines in your FASTA file to look like this so bin names can be connected to the individual sequences and so on:
>bin_34
When I run this to remove the excessive information from the FASTA file using this command,
sed -i '' 's/ .*$//g' mags_amino.fa
then the next command run without any issue:
anvi-interactive -p test.db -f mags_amino.fa -t MAGS-tree.txt --manual-mode
I think the same will happen with the rest of the commands you've been trying to run if you were to use this new FASTA file.
Best wishes, Meren
Short description of the problem
anvi-interactive crashes when running with a contig.db, a profile.db, and a collection of external bins and a external tree
anvi'o version
anvi-self-test --version Anvi'o .......................................: hope (v7.1)
Profile database .............................: 38 Contigs database .............................: 20 Pan database .................................: 15 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 2 tRNA-seq database ............................: 2
System info
Ubuntu 22 Anvio instaled in a conda environment
Detailed description of the issue
I have created a contig.db from a contig.fa file, then I mapped the reads to the contig and make a profiledb. Finally, I imported a collection of external bins, get the concatenated aminoacid fasta file and the tree file. The problem appers when I tried to use anvi-interactive that gives an error. When I tried to use anvi-interactive without the external bins do not work too, but in this case it seems that the error comes from the absence of any hierarchical clustering. I did not pass the --cluster-contigs flag when run anvi-profile because this is one sample from a group of three, and a I will do that after merging the three profiles. Below I paste what anvio said after running anvi-interactive:
Contigs DB ...................................: Initialized: ss1-CONTIGS.db (v. 20)
Interactive mode .............................: collection
WARNING
ProfileSuperClass found a collection focus, which means it will be initialized using only the splits in the profile database that are affiliated with the collection MAGS and all bins it describes.
Auxiliary Data ...............................: Found: ss1/AUXILIARY-DATA.db (v. 2)
Profile Super ................................: Initialized with 20209 of 435975 splits: ss1/PROFILE.db (v. 38)
THE MORE YOU KNOW ?
Someone asked the Contigs Superclass to initialize only a subset of contig sequences. Usually this is a good thing and means that some good code somewhere is looking after you. Just FYI, this class will only know about 17,938 contig sequences instead of all the things in the database.
Additional Tree ..............................: Splits will be organized based on 'MAGS-tree:unknown:unknown'.
d = interactive.Interactive(args)
File "/home/fulgencio/miniconda3/envs/anvio-7.1/lib/python3.6/site-packages/anvio/interactive.py", line 254, in init
self.load_collection_mode()
File "/home/fulgencio/miniconda3/envs/anvio-7.1/lib/python3.6/site-packages/anvio/interactive.py", line 1121, in load_collection_mode
self.p_meta['default_item_order'] = get_default_item_order_name(default_clustering_class, self.p_meta['available_item_orders'])
File "/home/fulgencio/miniconda3/envs/anvio-7.1/lib/python3.6/site-packages/anvio/dbops.py", line 4945, in get_default_item_order_name
default_item_order = list(item_orders_dict.keys())[0]
AttributeError: 'list' object has no attribute 'keys'
Traceback (most recent call last): File "/home/fulgencio/miniconda3/envs/anvio-7.1/bin/anvi-interactive", line 122, in
Files / commands to reproduce the issue
bowtie2-build ss1.fa ss1 bowtie2 --threads 24 -x ss1 -1 shot_pathog/ss1_R1.fastq.gz \ -2 shot_pathog/ss1_R2.fastq.gz \ --no-unal \ -S ss1.sam samtools view -@ 24 -F 4 -bS ss1.sam > ss1-RAW.bam samtools sort -@ 24 ss1-RAW.bam -o ss1.bam samtools index -@ 24 ss1.bam rm ss1.sam ss1-RAW.bam
anvi-profile -c ss1-CONTIGS.db \ -S ss1 \ -i ss1.bam \ --profile-SCVs \ --num-threads 16 \ -S ss1
anvi-import-collection -c ss1-CONTIGS.dg \ -p ss1/ss1-PROFILE.db \ -C MAGS --contigs-mode MAGS.txt
anvi-get-sequences-for-hmm-hits -c ss1-CONTIGS.db \ -p ss1/PROFILE.db \ -o mags_amino.fa \ -C MAGS \ --hmm-source Bacteria_71 \ --gene-names Ribosomal_L1,Ribosomal_L2,Ribosomal_L3,Ribosomal_L4,Ribosomal_L5,Ribosomal_L6 \ --return-best-hit \ --get-aa-sequences \ --concatenate
anvi-gen-phylogenomic-tree -f mags_amino.fa \ -o MAGS-tree.txt
anvi-interactive -p ss1/PROFILE.db -c ss1-CONTIGS.db -C MAGS -t MAGS-tree.txt
I am uploading files to drive. I will edit this post with the link when it finish