Closed vnsriniv closed 7 years ago
Hi Varun,
I had a chance to work on this today, and I realized that this in fact is not a bug, but you couldn't see what was going on, because the reporting wasn't as clear as it should have been. So I fixed it with the commits above. So now when you have this situation, anvi'o will explain what's up more clearly. Please update your codebase to master
if you can, and try again.
But you can re-run your things without updating to master, and everything would work if you simply use the --gene-caller
parameter with anvi-gen-genomes-storage
program. The gene caller name you will use with that paramter should match the gene caller name in your external gene calls file.
Please let me know if it solves your issue.
Best,
Hi Meren,
Thanks! I updated it to master
and it worked great with the --gene-caller
parameter! The reporting in the absence of the --gene-caller
parameter is also abundantly clear.
Varun
Thank you very much for trying! I'm glad it worked.
Dear Anvio developpers, I encountered a similar issue with version 7. Basically I've pretty much imported all annotations from our production pipeline. Our pipeline also uses Prodigal v2.6.3 to call genes and these gene calls were imported into our contigs db.
anvi-db-info ./contigs.db
DB Info (no touch)
===============================================
Database Path ................................: ./contigs.db
Description ..................................: [Not found, but it's OK]
Type .........................................: contigs
Version ......................................: 20
DB Info (no touch also)
===============================================
project_name .................................: megahit_bwa
contigs_db_hash ..............................: hash41683284
split_length .................................: 20000
kmer_size ....................................: 4
num_contigs ..................................: 1345272
total_length .................................: 1899811878
num_splits ...................................: 1347502
genes_are_called .............................: 1
external_gene_calls ..........................: 1
external_gene_amino_acid_seqs ................: 1
skip_predict_frame ...........................: 1
splits_consider_gene_calls ...................: 1
scg_taxonomy_was_run .........................: 0
scg_taxonomy_database_version ................: None
trna_taxonomy_was_run ........................: 0
trna_taxonomy_database_version ...............: None
creation_date ................................: 1621362887.25042
gene_level_taxonomy_source ...................: default_matrix
gene_function_sources ........................: KEGG_KO,KEGG_pathway,COG,COG_category,COG_name,KEGG_module
* Please remember that it is never a good idea to change these values. But in some
cases it may be absolutely necessary to update something here, and a programmer
may ask you to run this program and do it. But even then, you should be
extremely careful.
AVAILABLE GENE CALLERS
===============================================
* 'prodigal' (2,693,420 gene calls)
* 'barrnap' (1,895 gene calls)
* 'Ribosomal_RNA_28S' (93 gene calls)
* 'Ribosomal_RNA_23S' (344 gene calls)
* 'Ribosomal_RNA_18S' (35 gene calls)
* 'Ribosomal_RNA_16S' (184 gene calls)
AVAILABLE HMM SOURCES
===============================================
* 'Archaea_76' (type: singlecopy; num genes: 76)
* 'Bacteria_71' (type: singlecopy; num genes: 71)
* 'Protista_83' (type: singlecopy; num genes: 83)
* 'Ribosomal_RNA_12S' (type: Ribosomal_RNA_12S; num genes: 1)
* 'Ribosomal_RNA_16S' (type: Ribosomal_RNA_16S; num genes: 3)
* 'Ribosomal_RNA_18S' (type: Ribosomal_RNA_18S; num genes: 1)
* 'Ribosomal_RNA_23S' (type: Ribosomal_RNA_23S; num genes: 2)
* 'Ribosomal_RNA_28S' (type: Ribosomal_RNA_28S; num genes: 1)
* 'Ribosomal_RNA_5S' (type: Ribosomal_RNA_5S; num genes: 5)
However when I do :
anvi-gen-genomes-storage --gene-caller prodigal -i ./internal_genomes_test.txt -o DEPTH-GENOMES.db
It still tells me the following error msg:
Config Error: None of your genomes seem to have a gene call, which is a typical error you get
if you are working with contigs databases with external gene calls. You can
solve it by looking at the output of the program `anvi-db-info` for a given
contigs database in your collection, and use one of the gene caller sources
listed in the output using the `--gene-caller` parameter.
Is this because I used prodigal as a gene caller, but "externally" to anvio, that the --gene-caller
argument does not seem to work? Would there be an (easy) way around this?
Thanks very much,
Hi @jtremblay,
I wonder if we addressed this problem in our active repository. Would you mind trying this after switching to the active branch? Otherwise you can privately send me two example FASTA files with external gene calls for them so I can run anvi-gen-contigs-database
and anvi-gen-genomes-storage
to see if I can reproduce the problem :/
Hi @meren , Thanks for the input - okay I've finally tried with the latest master branch and I now get an error when generating my profiles (anvi-profile). I've regenerated the complete workflow (including the contigs.db file) from scratch.
^M ^M^[[48;5;6m^[[38;5;0m^M[05 Oct 21 21:15:35 Loading the contigs DB] ... ^[[0m^M
^M ^M^[[48;5;6m^[[38;5;0m^M[05 Oct 21 21:18:45 Loading contig sequences] Reading **ALL** data from ` (...)^[[0m^M
WARNING
=====================================
Amino acid linkmer frequencies will not be characterized for this profile.
anvio ..............................: 7-dev
profiler_version ...................: 37
sample_id ..........................: s395100
description ........................: None
profile_db .........................: /gpfs/fs1/nrc/eme/jut001/projects/DFO/gulley-haddock/megahit_bwa/anvio/profiles/395100/PROFILE.db
contigs_db .........................: True
contigs_db_hash ....................: hash3f59231d
cmd_line ...........................: /space/project/grdi/eco/bioinfo-tools/nrc/software/anvio/anvio-dev/bin/anvi-profile -T 12 -i contig_abundance/395100/395100.bam -c anvio/contigs.db -o anvio/profiles/395100
merged .............................: False
blank ..............................: False
split_length .......................: 20,000
min_contig_length ..................: 1,000
max_contig_length ..................: 9,223,372,036,854,775,807
min_mean_coverage ..................: 0
clustering_performed ...............: False
min_coverage_for_variability .......: 10
skip_SNV_profiling .................: False
skip_INDEL_profiling ...............: False
profile_SCVs .......................: False
min_percent_identity ...............: None
report_variability_full ............: False
WARNING
=====================================
Your minimum contig length is set to 1,000 base pairs. So anvi'o will not take
into consideration anything below that. If you need to kill this an restart your
analysis with another minimum contig length value, feel free to press CTRL+C.
^M ^M^[[48;5;6m^[[38;5;0m^M[05 Oct 21 21:19:02 Init] Reading BAM File ^[[0m^M
output_dir .........................: /gpfs/fs1/nrc/eme/jut001/projects/DFO/gulley-haddock/megahit_bwa/anvio/profiles/395100
num_reads_in_bam ...................: 27,881,769
num_contigs ........................: 1,345,272
num_contigs_after_M ................: 480,000
num_splits .........................: 482,230
total_length .......................: 1,310,007,814
^M ^M^[[48;5;6m^[[38;5;0m^M[05 Oct 21 21:20:13 Profiling w/12 threads] initializing threads ... ETA: ∞:∞:∞^[[0m^M
✖ anvi-profile encountered an error after 0:21:36.790592
Traceback (most recent call last):
File "/space/project/grdi/eco/bioinfo-tools/nrc/software/anvio/anvio-dev/bin/anvi-profile", line 4, in <module>
__import__('pkg_resources').run_script('anvio==7.dev0', 'anvi-profile')
File "/fs/isi-nas1/grdi/grdi_eco/bioinfo-tools/nrc/software/python/python-3.6.5/venv_anvio_7.0_dev/lib/python3.6/site-packages/pkg_resources/__init__.py", line 658, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/fs/isi-nas1/grdi/grdi_eco/bioinfo-tools/nrc/software/python/python-3.6.5/venv_anvio_7.0_dev/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1438, in run_script
exec(code, namespace, namespace)
File "/fs/isi-nas1/grdi/grdi_eco/bioinfo-tools/nrc/software/anvio/anvio-dev/lib/python3.6/site-packages/anvio-7.dev0-py3.6.egg/EGG-INFO/scripts/anvi-profile", line 102, in <module>
main(args)
File "/space/project/grdi/eco/bioinfo-tools/nrc/software/anvio/anvio-dev/lib/python3.6/site-packages/anvio-7.dev0-py3.6.egg/anvio/terminal.py", line 875, in wrapper
program_method(*args, **kwargs)
File "/fs/isi-nas1/grdi/grdi_eco/bioinfo-tools/nrc/software/anvio/anvio-dev/lib/python3.6/site-packages/anvio-7.dev0-py3.6.egg/EGG-INFO/scripts/anvi-profile", line 39, in main
profiler.BAMProfiler(args)._run()
File "/space/project/grdi/eco/bioinfo-tools/nrc/software/anvio/anvio-dev/lib/python3.6/site-packages/anvio-7.dev0-py3.6.egg/anvio/profiler.py", line 520, in _run
self.profile_multi_thread()
File "/space/project/grdi/eco/bioinfo-tools/nrc/software/anvio/anvio-dev/lib/python3.6/site-packages/anvio-7.dev0-py3.6.egg/anvio/profiler.py", line 1162, in profile_multi_thread
self.store_contigs_buffer()
File "/space/project/grdi/eco/bioinfo-tools/nrc/software/anvio/anvio-dev/lib/python3.6/site-packages/anvio-7.dev0-py3.6.egg/anvio/profiler.py", line 1240, in store_contigs_buffer
append_mode=True)
File "/space/project/grdi/eco/bioinfo-tools/nrc/software/anvio/anvio-dev/lib/python3.6/site-packages/anvio-7.dev0-py3.6.egg/anvio/tables/views.py", line 94, in create_new_view
self.sanity_check(view_data)
File "/space/project/grdi/eco/bioinfo-tools/nrc/software/anvio/anvio-dev/lib/python3.6/site-packages/anvio-7.dev0-py3.6.egg/anvio/tables/views.py", line 43, in sanity_check
if len(view_data[0]) != 3:
IndexError: list index out of range
Any ideas? Thank you!
Hey @jtremblay,
Looking at the code, I have no idea how can the code end up failing that sanity check. I just generated a contigs database with external gene calls, profiled a BAM file using it with and without multi-threading. Everything worked as expected :/
I am sorry I am not useful, but please let me know if you realize something important and/or if you have specific questions.
Well that's odd, I've lowered the num threads to 1 and it seems to work fine so far. Maybe a compute node threading related issue...idk. I'll try with additional -T values and let you know if I ever notice something useful. Thank you for your time!
Hi @meren, Well its not the number of threads after all. When I use a subsample of the same dataset it works fine though regardless of the number of threads. So I've finally went back with anvio 7.0 on another cluster (fresh install) and this time it worked fine...(!) - both anvi-profile and anvi-gen-genomes-storage. I just don't know what to say... But thanks for your time!
Hi Anvi'o team,
I am trying to perform some comparative genomics (published genomes/MAGs) using anvio. My pipeline is as follows
For steps 3-5 I following the Importing functions from Prokka tutorial.
While using anvi-gen-genomes-storage, it is having trouble recognizing gene calls. Is this normal and to be expected with external gene calls and annotations? Any help in this matter would be very much appreciated! I have attached the output for the command below!
Thanks Varun
anvi-gen-genomes-storage -e external_genomes.txt -o MY-GENOMES.h5
WARNING
Good news! Anvi'o found all these functions that are common to all of your genomes and will use them for downstream analyses and is very proud of you: 'Prokka:Prodigal'.
Internal genomes .............................: 0 have been initialized.
[08 Sep 17 10:04:02 Initializing external genomes] working on CAP_HKU2_Mao2014 /home/vnsriniv/virtual-envs/anvio-dev/local/lib/python3.5/site-packages/numpy/core/fromnumeric.py:2909: RuntimeWarning: Mean of empty slice. out=out, **kwargs) /home/vnsriniv/virtual-envs/anvio-dev/local/lib/python3.5/site-packages/numpy/core/_methods.py:80: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) External genomes .............................: 6 found.
The new genomes storage ......................: MY-GENOMES.h5 (v4, signature: 404c1b9f) Number of genomes ............................: 6 (internal: 0, external: 6) Number of gene calls .........................: 0 Number of partial gene calls .................: 0