merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
436 stars 144 forks source link

anvi-profile: Config Error: get_gene_info_for_each_position #1406

Closed domenico-simone closed 4 years ago

domenico-simone commented 4 years ago

Hi,

I am trying to run the metagenomic workflow without gene calling and without external gene calls either. Basically I wanted to use only k-mer frequencies and contig coverage in multiple samples. This is what I get - the error is at the end of this post. If you need input files, I can share them with you!

anvi'o version:

$ anvi-self-test --version

Anvi'o version ...............................: esther (v6.2)
Profile DB version ...........................: 31
Contigs DB version ...........................: 14
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

Installed as conda environment and running on a CentOS 7 computing cluster.

$ anvi-gen-contigs-database -f water05_sensitive_contigs.fa -o water05_sensitive_contigs.db -n 'water05_sensitive_contigs db' --skip-gene-calling

Input FASTA file .............................: /crex/proj/uppstore2018116/thermokarst_fungi/anvio-test/water05_sensitive_contigs.fa
Name .........................................: water05_sensitive_contigs db
Description ..................................: No description is given
Split Length .................................: 20,000                                                                                                                                                                                 
K-mer size ...................................: 4
Skip gene calling? ...........................: True
External gene calls provided? ................: None
Ignoring internal stop codons? ...............: False
Splitting pays attention to gene calls? ......: False
Contigs database .............................: A new database, water05_sensitive_contigs.db, has been created.                                                                                                                        
Number of contigs ............................: 548,539
Number of splits .............................: 551,275
Total number of nucleotides ..................: 1,839,237,746
Gene calling step skipped ....................: True
Splits broke genes (non-mindful mode) ........: True
Desired split length (what the user wanted) ..: 20,000
Average split length (wnat anvi'o gave back) .: 20,230
$ anvi-profile -i P10_water05_sensitive.bam -c water05_sensitive_contigs.db --output-dir P10_water05_sensitive --sample-name P10 -M 1500

Contigs DB .........................: Initialized: water05_sensitive_contigs.db (v. 14)                                                                                                                                                

WARNING
=====================================
Amino acid linkmer frequencies will not be characterized for this profile.

anvio ..............................: 6.2
profiler_version ...................: 31
sample_id ..........................: P10
description ........................: None
profile_db .........................: /crex/proj/uppstore2018116/thermokarst_fungi/anvio-test/P10_water05_sensitive/PROFILE.db
contigs_db .........................: True
contigs_db_hash ....................: hashf7b39ed4
cmd_line ...........................: /crex/proj/uppstore2018116/domenico/conda_envs/anvio-6.2/bin/anvi-profile -i P10_water05_sensitive.bam -c water05_sensitive_contigs.db --output-dir P10_water05_sensitive --sample-name P10 -M 1500
merged .............................: False
blank ..............................: False
split_length .......................: 20,000
min_contig_length ..................: 1,500
max_contig_length ..................: 9,223,372,036,854,775,807
min_mean_coverage ..................: 0
clustering_performed ...............: False
min_coverage_for_variability .......: 10
skip_SNV_profiling .................: False
profile_SCVs .......................: False
report_variability_full ............: False

WARNING
=====================================
Your minimum contig length is set to 1,500 base pairs. So anvi'o will not take
into consideration anything below that. If you need to kill this an restart your
analysis with another minimum contig length value, feel free to press CTRL+C.

input_bam ..........................: P10_water05_sensitive.bam                                                                                                                                                                        
output_dir .........................: /crex/proj/uppstore2018116/thermokarst_fungi/anvio-test/P10_water05_sensitive
total_reads_mapped .................: 43,902,693
num_contigs ........................: 548,539

WARNING
=====================================
The contigs database 'water05_sensitive_contigs.db' does not contain any gene
calls. Which means the profiling step will not be able to characterize 'gene
coverages'. If you are OK with this, anvi'o will be OK with it as well.

num_contigs_after_M ................: 548,539
num_splits .........................: 551,275
total_length .......................: 1,839,237,746
[W::hts_idx_load2] The index file is older than the data file: P10_water05_sensitive.bam.bai
[12 Apr 20 18:14:52 Profiling w/1 thread] contigs are being processed ...                                                                                                                                                    ETA: ∞:∞:∞
✖ anvi-profile encountered an error after 0:00:13.252783

Config Error: get_gene_info_for_each_position :: I am asked to return stuff, but
              self.nt_position_info is None!                                    

Thank you,

Domenico

meren commented 4 years ago

Hi @domenico-simone,

Thank you very much for the detailed report! This report helps us realize that the recent changes in the anvi-profile do not know how to handle contigs databases with no gene calls. Fortunately self.a_meta['genes_are_called'] knows whether genes were called or not, so I envision that @ekiefl can fix it in the master repo quite rapidly.

Best,

ekiefl commented 4 years ago

Hi @domenico-simone,

I am going to try and fix this in the master branch now. For your version, you can simply provide --skip-SNV-profiling to avoid this error.

domenico-simone commented 4 years ago

Hi @ekiefl thanks! However, I've tried to run anvi-profile as you suggested and this is what I got...

$ anvi-profile -i P10_water05_sensitive.bam -c water05_sensitive_contigs.db --output-dir P10_water05_sensitive --sample-name P10 -M 1500 --skip-SNV-profiling

Contigs DB .........................: Initialized: water05_sensitive_contigs.db (v. 14)                                                                                 

WARNING
=====================================
Single-nucleotide variation will not be characterized for this profile.

WARNING
=====================================
Amino acid linkmer frequencies will not be characterized for this profile.

anvio ..............................: 6.2
profiler_version ...................: 31
sample_id ..........................: P10
description ........................: None
profile_db .........................: /crex/proj/uppstore2018116/thermokarst_fungi/anvio-test/P10_water05_sensitive/PROFILE.db
contigs_db .........................: True
contigs_db_hash ....................: hashf7b39ed4
cmd_line ...........................: /crex/proj/uppstore2018116/domenico/conda_envs/anvio-6.2/bin/anvi-profile -i P10_water05_sensitive.bam -c water05_sensitive_contigs.db --output-dir P10_water05_sensitive --sample-name P10 -M 1500 --skip-SNV-profiling
merged .............................: False
blank ..............................: False
split_length .......................: 20,000
min_contig_length ..................: 1,500
max_contig_length ..................: 9,223,372,036,854,775,807
min_mean_coverage ..................: 0
clustering_performed ...............: False
min_coverage_for_variability .......: 10
skip_SNV_profiling .................: True
profile_SCVs .......................: False
report_variability_full ............: False

WARNING
=====================================
Your minimum contig length is set to 1,500 base pairs. So anvi'o will not take
into consideration anything below that. If you need to kill this an restart your
analysis with another minimum contig length value, feel free to press CTRL+C.

input_bam ..........................: P10_water05_sensitive.bam                                                                                                         
output_dir .........................: /crex/proj/uppstore2018116/thermokarst_fungi/anvio-test/P10_water05_sensitive
total_reads_mapped .................: 43,902,693
num_contigs ........................: 548,539

WARNING
=====================================
The contigs database 'water05_sensitive_contigs.db' does not contain any gene
calls. Which means the profiling step will not be able to characterize 'gene
coverages'. If you are OK with this, anvi'o will be OK with it as well.

num_contigs_after_M ................: 548,539
num_splits .........................: 551,275
total_length .......................: 1,839,237,746
[13 Apr 20 19:35:09 Profiling w/1 thread] 500/548539 contigs ⚙  | WRITING TO DB 💾 ...                                                                      EETA: 32m32s 
✖ anvi-profile encountered an error after 0:00:24.327729
Traceback (most recent call last):
  File "/crex/proj/uppstore2018116/domenico/conda_envs/anvio-6.2/bin/anvi-profile", line 93, in <module>
    main(args)
  File "/crex/proj/uppstore2018116/domenico/conda_envs/anvio-6.2/lib/python3.6/site-packages/anvio/terminal.py", line 748, in wrapper
    program_method(*args, **kwargs)
  File "/crex/proj/uppstore2018116/domenico/conda_envs/anvio-6.2/bin/anvi-profile", line 33, in main
    profiler.BAMProfiler(args)._run()
  File "/crex/proj/uppstore2018116/domenico/conda_envs/anvio-6.2/lib/python3.6/site-packages/anvio/profiler.py", line 271, in _run
    self.profile_single_thread()
  File "/crex/proj/uppstore2018116/domenico/conda_envs/anvio-6.2/lib/python3.6/site-packages/anvio/profiler.py", line 734, in profile_single_thread
    del split.auxiliary.split.SNV_profiles
AttributeError: 'NoneType' object has no attribute 'split'

Thank you,

Domenico

ekiefl commented 4 years ago

Shucks. Ok. Please hang tight. In the mean time you can getting the master branch of anvio up and running, since that is where the fix will be active

ekiefl commented 4 years ago

Actually, if you ran again with --skip-SNV-profiling on master, the second error should not happen, because that portion of the code has been fixed since 6.2

domenico-simone commented 4 years ago

Ok, I'll install the master version and keep you posted! Thank you :)

ekiefl commented 4 years ago

I think this is fixed with these 2 commits

217b4e87f05a9daf0c56c71b8a40cf6d91883e00 bb2397f4786dd46aac758e37ebdcc4aab1bf369e

Could you please do me a favor and try running without --skip-SNV-profiling, since this will clarify if the first error is fixed? @semiller10 is also testing since he ran into the same error

domenico-simone commented 4 years ago

Great! I've tried with the master version and it works now (I've tested it without --skip-SNV-profiling)! Though I think I'll stick to the master version from now on, when will this fix be available through conda?

Thank you,

Domenico

ekiefl commented 4 years ago

Not until v6.3, which probably won't be for a while (months is my guess)

meren commented 4 years ago

months is my guess

I SEE YOUR MONTHS AND RAISE YOU MONTHS MORE.

..

.

(probably it will be sooner than we hope)