merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
427 stars 145 forks source link

anvi-split throws error: not valid collection ID #1704

Closed microDM closed 3 years ago

microDM commented 3 years ago

Short description of the problem

I have followed the tutorial at https://merenlab.org/2016/11/08/pangenomics-v2/ Using anvi-pan-genome I have created PAN-GENOME.db of 143 complete genomes. Now I want to split the PAN_GENOME into CORE, ACCESSORY and UNIQUE genes.

anvi'o version

Anvi'o .......................................: hope (v7-dev)

Profile database .............................: 35 Contigs database .............................: 20 Pan database .................................: 14 Genome data storage ..........................: 7 Auxiliary data storage .......................: 2 Structure database ...........................: 2 Metabolic modules database ...................: 2 tRNA-seq database ............................: 1

anvi-self-test --version

System info

I am using Ubuntu 20.10. I installed anvio using miniconda.

Detailed description of the issue

After using anvi-split using default option I got following output: anvi-split -p Burkh_Pan/Burkh_Pan-PAN.db -C default -g Burk-GENOMES.db -o temp-split

Functions found .............................................: COG20_FUNCTION, COG20_CATEGORY, COG20_PATHWAY
Genomes storage .............................................: Initialized (storage hash: hash1a269245)
Num genomes in storage ......................................: 143 Num genomes will be used ....................................: 143 Pan DB ......................................................: Initialized: Burkh_Pan/Burkh_Pan-PAN.db (v. 14) Gene cluster homogeneity estimates ..........................: Functional: [YES]; Geometric: [YES]; Combined: [YES]

Config Error: default is not a valid collection ID. See a list of available ones with '--list- collections' flag

anvi-export-table Burkh_Pan-PAN.db -l

self gene_clusters item_additional_data item_orders layer_additional_data layer_orders views collections_info collections_bins_info collections_of_contigs collections_of_splits states gene_cluster_frequencies gene_cluster_presence_absence

Where all collections are empty dataframe. How can I extract CORE, ACCESSORY and UNIQUE genes from PAN-GENOME database.?

Files to reproduce

If you have no files you can share with us to reproduce the issue, please remove this text and header completely.

If you have files (i.e., a contigs database, a profile database, a BAM file, etc), please put them in a single directory, compress the directory, upload it to Dropbox and share with us a download link here along with instructions on how to reproduce the error.

meren commented 3 years ago

Hi @microDM,

This message here:

Config Error: default is not a valid collection ID. See a list of available ones with '--list-
collections' flag

is for the program anvi-split, not for the program anvi-export-table. So I am not sure what was your intention to run this:

anvi-export-table Burkh_Pan-PAN.db -l

If you truly intend to split your pangenome into CORE, ACCESSORY and UNIQUE gene clusters, you should first create a collection that contain those gene clusters, and store it in the pan database (the name you give to your collection will be the name you will need when you run anvi-split with the -C parameter later). You can do this through anvi-display-pan interactively, or through the command line using anvi-import-collection after identifying which gene clusters are core, accessory, or singleton (you can use anvi-summarize on your pangenome to get that information).

Best,

microDM commented 3 years ago

Got it. I used anvi-export-table to export "gene_cluster_presence_absence". Then marked CORE, ACCESSORY and UNIQUE clusters. Then imported collection using anvi-import-collection. Then split my pangenome using anvi-split

meren commented 3 years ago

You are a hacker, @microDM :) Great job.