merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
413 stars 142 forks source link

[Bug] external-genomes.txt missing genomes #2239

Closed Alesita0381 closed 3 months ago

Alesita0381 commented 3 months ago

Hi!

Im working with Anvio and im trying to run this command to generate an input file containing paths to all of the databases (17 files).

anvi-script-gen-genomes-file --input-dir ./ \ -o external-genomes.txt

Then, when i try to check it with cat external-genomes.txt the content only has one path. What am i doing wrong?

ivagljiva commented 3 months ago

Hi @Alesita0381 , what do you see when you run ls *.db in the same directory?

The only possibilities I can imagine is that your databases in a different folder. If they are in subdirectories of your working directory, you will need to use the --include-subdirs flag for anvi'o to find them.

Alesita0381 commented 3 months ago

Hello, when I run the command everything indicates to be fine, but when I check the content it only appear the path of a single file, but i have 17 files that are located in the same place, only file number 17 appears, but in the instructions it says that they have to appear the path of each file in a list. I don't know if this is because the version, my anvio is 8 and in the instructions are 7.

El jue., 14 mar. 2024 03:02, Iva Veseli @.***> escribió:

Hi @Alesita0381 https://github.com/Alesita0381 , what do you see when you run ls *.db in the same directory?

The only possibilities I can imagine is that your databases in a different folder. If they are in subdirectories of your working directory, you will need to use the --include-subdirs flag for anvi'o to find them.

— Reply to this email directly, view it on GitHub https://github.com/merenlab/anvio/issues/2239#issuecomment-1996776297, or unsubscribe https://github.com/notifications/unsubscribe-auth/BGM7MBJ4X77M3NJKYIE6MFLYYFKPZAVCNFSM6AAAAABEVCWQS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJWG43TMMRZG4 . You are receiving this because you were mentioned.Message ID: @.***>

meren commented 3 months ago

Can you run this command on two of the anvi'o databases you have in that directory (any pair would be fine) and send it back please:

anvi-db-info 01.db

anvi-db-info 02.db
Alesita0381 commented 3 months ago

This is the result:

DB Info (no touch)
===============================================
Database Path ................................: maxbin_1.db
description ..................................: [Not found, but it's OK]
db_type ......................................: contigs (variant: unknown)
version ......................................: 21

DB Info (no touch also)
===============================================
project_name .................................: Pseudomonas
contigs_db_hash ..............................: hash59407a87
split_length .................................: 20000
kmer_size ....................................: 4
num_contigs ..................................: 246
total_length .................................: 4917381
num_splits ...................................: 321
gene_level_taxonomy_source ...................: None
gene_function_sources ........................: None
genes_are_called .............................: 1
external_gene_calls ..........................: 0
external_gene_amino_acid_seqs ................: 0
skip_predict_frame ...........................: 0
splits_consider_gene_calls ...................: 1
trna_taxonomy_was_run ........................: 0
trna_taxonomy_database_version ...............: None
creation_date ................................: 1710360177.51146
scg_taxonomy_was_run .........................: 1
scg_taxonomy_database_version ................: v214.1

* Please remember that it is never a good idea to change these values. But in some
  cases it may be absolutely necessary to update something here, and a
  programmer may ask you to run this program and do it. But even then, you
  should be extremely careful.

AVAILABLE GENE CALLERS
===============================================
* 'prodigal' (4,640 gene calls)

AVAILABLE FUNCTIONAL ANNOTATION SOURCES
===============================================
* No functional annotations found in this contigs database :/

AVAILABLE HMM SOURCES
===============================================
* 'Archaea_76' (76 models with 60 hits)
* 'Bacteria_71' (71 models with 114 hits)
* 'Protista_83' (83 models with 4 hits)
* 'Ribosomal_RNA_12S' (1 model with 0 hits)
* 'Ribosomal_RNA_16S' (3 models with 0 hits)
* 'Ribosomal_RNA_18S' (1 model with 0 hits)
* 'Ribosomal_RNA_23S' (2 models with 0 hits)
* 'Ribosomal_RNA_28S' (1 model with 0 hits)
* 'Ribosomal_RNA_5S' (5 models with 0 hits)

(anvio-8) alejandra-aguilar@:MaxbinDB$ 
(anvio-8) alejandra-aguilar@:MaxbinDB$ anvi-db-info maxbin2.db 

DB Info (no touch)
===============================================
Database Path ................................: maxbin2.db
description ..................................: [Not found, but it's OK]
db_type ......................................: contigs (variant: unknown)
version ......................................: 21

DB Info (no touch also)
===============================================
project_name .................................: Pseudomonas
contigs_db_hash ..............................: hash151f66c3
split_length .................................: 20000
kmer_size ....................................: 4
num_contigs ..................................: 184
total_length .................................: 757460
num_splits ...................................: 187
gene_level_taxonomy_source ...................: None
gene_function_sources ........................: None
genes_are_called .............................: 1
external_gene_calls ..........................: 0
external_gene_amino_acid_seqs ................: 0
skip_predict_frame ...........................: 0
splits_consider_gene_calls ...................: 1
trna_taxonomy_was_run ........................: 0
trna_taxonomy_database_version ...............: None
creation_date ................................: 1710360932.87123
scg_taxonomy_was_run .........................: 1
scg_taxonomy_database_version ................: v214.1

* Please remember that it is never a good idea to change these values. But in some
  cases it may be absolutely necessary to update something here, and a
  programmer may ask you to run this program and do it. But even then, you
  should be extremely careful.

AVAILABLE GENE CALLERS
===============================================
* 'prodigal' (926 gene calls)

AVAILABLE FUNCTIONAL ANNOTATION SOURCES
===============================================
* No functional annotations found in this contigs database :/

AVAILABLE HMM SOURCES
===============================================
* 'Archaea_76' (76 models with 13 hits)
* 'Bacteria_71' (71 models with 29 hits)
* 'Protista_83' (83 models with 0 hits)
* 'Ribosomal_RNA_12S' (1 model with 0 hits)
* 'Ribosomal_RNA_16S' (3 models with 0 hits)
* 'Ribosomal_RNA_18S' (1 model with 0 hits)
* 'Ribosomal_RNA_23S' (2 models with 0 hits)
* 'Ribosomal_RNA_28S' (1 model with 0 hits)
* 'Ribosomal_RNA_5S' (5 models with 0 hits)
meren commented 3 months ago

This is why it is happening, @Alesita0381:

project_name .................................: Pseudomonas
(...)
project_name .................................: Pseudomonas

The project name for every single contigs-db is the same (when they should be unique).

meren commented 3 months ago

Once solution for this could be to add a check to anvi-script-gen-genomes-file. I.e.,

if len(set(genome_names)) != len(genome_names):
    raise ConfigError("Oi. Not every contigs-db this program considers seems to have "
                      "a unique name. Such a redundancy will make sad.")
Alesita0381 commented 3 months ago

Thanks, I assigned a unique project name to each file and the path of all the files now appears in external-genomes.txt !!! Thank you :)

meren commented 3 months ago

I'm glad it is resolved, @Alesita0381, and thanks for catching this. We will do something to ensure there is an informative error report generated by the tool in the future.