merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

[BUG] Length mismatch error in anvi-run-hmms #2089

Closed Ge0rges closed 1 year ago

Ge0rges commented 1 year ago

Short description of the problem

I think I've found a bug in the anvi-run-hmms, the software crashed. My best guess is that a bug in the way anvio parses HMMER results leads to the error detailed below.

anvi'o version

Anvi'o .......................................: hope (v7.1)

Profile database .............................: 38
Contigs database .............................: 20
Pan database .................................: 15
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 2
tRNA-seq database ............................: 2

System info

Anvio is running on a research server that I did not setup, so I am unsure how exactly it was installed. However it exists in its own Conda environment and I assume it was properly installed. Here is the output of name -a: Linux 5.13.19-2-pve #1 SMP PVE 5.13.19-4 x86_64 x86_64 x86_64 GNU/Linux

Detailed description of the issue

I ran: anvi-run-hmms -H anvio_psych_hmm/ -c psych_genomes.db -T 20 --just-do-it

and obtained:

Number of raw hits in table file .............: 9                                                                                                             

✖ anvi-run-hmms encountered an error after 0:02:31.706049
Traceback (most recent call last):
  File "/usr/local/miniconda3/envs/anvio-7.1/bin/anvi-run-hmms", line 142, in <module>
    main(args)
  File "/usr/local/miniconda3/envs/anvio-7.1/lib/python3.6/site-packages/anvio/terminal.py", line 875, in wrapper
    program_method(*args, **kwargs)
  File "/usr/local/miniconda3/envs/anvio-7.1/bin/anvi-run-hmms", line 97, in main
    search_tables.populate_search_tables(sources)
  File "/usr/local/miniconda3/envs/anvio-7.1/lib/python3.6/site-packages/anvio/tables/hmmhits.py", line 250, in populate_search_tables
    parser = parser_modules['search']['hmmer_table_output'](hmm_scan_hits_txt, alphabet=alphabet, context=context, program=self.hmm_program)
  File "/usr/local/miniconda3/envs/anvio-7.1/lib/python3.6/site-packages/anvio/parsers/hmmer.py", line 582, in __init__
    fixed_hmmer_table_txt =  self.fix_sad_hmmer_table_output(hmmer_table_txt, col_names)
  File "/usr/local/miniconda3/envs/anvio-7.1/lib/python3.6/site-packages/anvio/parsers/hmmer.py", line 754, in fix_sad_hmmer_table_output
    hmmer_df.columns = col_names_plus_description_cols
  File "/usr/local/miniconda3/envs/anvio-7.1/lib/python3.6/site-packages/pandas/core/generic.py", line 5192, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 67, in pandas._libs.properties.AxisProperty.__set__
  File "/usr/local/miniconda3/envs/anvio-7.1/lib/python3.6/site-packages/pandas/core/generic.py", line 690, in _set_axis
    self._data.set_axis(axis, labels)
  File "/usr/local/miniconda3/envs/anvio-7.1/lib/python3.6/site-packages/pandas/core/internals/managers.py", line 183, in set_axis
    "values have {new} elements".format(old=old_len, new=new_len)
ValueError: Length mismatch: Expected axis has 16 elements, new values have 19 elements

Files to reproduce

https://www.dropbox.com/sh/4u2u25om3wh0wbp/AAAA11l9KWekr73xh79aNipHa?dl=0

meren commented 1 year ago

This sounds like an issue with the HMMER version in the anvi'o environment (which should not have happened with the standard installation instructions). Would you be willing to install anvio-dev as explained in the installation page and test it again?

Ge0rges commented 1 year ago

Hi Meren,

I can try install the development environment, but this is a shared server (and I am not the admin) so it may take a bit of time. For reference, the installed version of hmmer is HMMER 3.3.2.

Ge0rges commented 1 year ago

@meren I was able to install the dev environment with no errors. Same error, see below.

Traceback (most recent call last):
  File "/Accounts/gkanaan/github/anvio/bin/anvi-run-hmms", line 143, in <module>
    main(args)
  File "/Accounts/gkanaan/github/anvio/anvio/terminal.py", line 881, in wrapper
    program_method(*args, **kwargs)
  File "/Accounts/gkanaan/github/anvio/bin/anvi-run-hmms", line 97, in main
    search_tables.populate_search_tables(sources)
  File "/Accounts/gkanaan/github/anvio/anvio/tables/hmmhits.py", line 277, in populate_search_tables
    parser = parser_modules['search']['hmmer_table_output'](hmm_scan_hits_txt, alphabet=alphabet, context=context, program=self.hmm_program)
  File "/Accounts/gkanaan/github/anvio/anvio/parsers/hmmer.py", line 582, in __init__
    fixed_hmmer_table_txt =  self.fix_sad_hmmer_table_output(hmmer_table_txt, col_names)
  File "/Accounts/gkanaan/github/anvio/anvio/parsers/hmmer.py", line 754, in fix_sad_hmmer_table_output
    hmmer_df.columns = col_names_plus_description_cols
  File "/Accounts/gkanaan/.conda/envs/anvio-dev/lib/python3.7/site-packages/pandas/core/generic.py", line 5500, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__
  File "/Accounts/gkanaan/.conda/envs/anvio-dev/lib/python3.7/site-packages/pandas/core/generic.py", line 766, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/Accounts/gkanaan/.conda/envs/anvio-dev/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 216, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/Accounts/gkanaan/.conda/envs/anvio-dev/lib/python3.7/site-packages/pandas/core/internals/base.py", line 58, in _validate_set_axis
    f"Length mismatch: Expected axis has {old_len} elements, new "
ValueError: Length mismatch: Expected axis has 16 elements, new values have 19 elements

Same command run. anti-self-test -v output is:

Anvi'o .......................................: hope (v7.1-dev)

Profile database .............................: 38
Contigs database .............................: 20
Pan database .................................: 16
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2
meren commented 1 year ago

Thanks for trying anvio-dev! I'll take a look into this in the next few hours.

meren commented 1 year ago

Hey @Ge0rges, thank you very much for your patience with this. It turned out to be a serious issue that we didn't see coming :) Your HMM directory uses GENE:DNA context. This is the first time someone is trying to run a DNA alphabet based model on coding genes. All examples so far run AA models or RNA models on genes, or DNA or RNA models on contigs, and never DNA models on genes. So the code was missing instructions to handle the output for that combination.

I think I fixed it in https://github.com/merenlab/anvio/commit/050b680300fb5fda2df666191b60ac084e63e8c0, and if you git pull from anvio-dev you should be able to run it on your contigs-db no problem.

Thanks a lot again for sending a test dataset to figure this one out.

meren commented 1 year ago

(If everything works please consider reporting back and closing the issue)

Ge0rges commented 1 year ago

Hi @meren, your fix worked. Thanks for getting to it so promptly. I will close the issue with a minor final note that I think there's a typo in the docs of the parser file you edited here. Where it says GENE it should say CONTIG. Just to avoid future confusion :)

meren commented 1 year ago

Good catch, thank you! Now fixed :)

And thanks for reporting back. I'm glad this is now resolved.