merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
426 stars 145 forks source link

anvi-genomes-storage encoding issue #473

Closed rkevork closed 7 years ago

rkevork commented 7 years ago

Meren,

I ran into a problem with the anvi-gen-genomes-storage command. i was able to fix it after some digging, but i wanted to share it with you incase it was a wider issue and not just limited to me. i was using the anvio-2.2.2 build in the virtual-env running python 3.6.0. The return looked like:

Traceback (most recent call last):
  File "/Users/rtkevork/virtual-envs/anvio-2.2.2/bin/anvi-gen-genomes-storage", line 56, in <module>
    pan.create_genomes_data_storage()
  File "/Users/rtkevork/virtual-envs/anvio-2.2.2/lib/python3.6/site-packages/anvio/panops.py", line 267, in create_genomes_data_storage
    self.load_genomes_descriptions()
  File "/Users/rtkevork/virtual-envs/anvio-2.2.2/lib/python3.6/site-packages/anvio/panops.py", line 109, in load_genomes_descriptions
    self.genomes[genome_name]['genome_hash'] = self.get_genome_hash_for_internal_genome(self.genomes[genome_name])
  File "/Users/rtkevork/virtual-envs/anvio-2.2.2/lib/python3.6/site-packages/anvio/panops.py", line 424, in get_genome_hash_for_internal_genome
    genome_hash = hashlib.sha224('_'.join([''.join(split_names_of_interest), contigs_db.meta['contigs_db_hash']])).hexdigest()[0:12]
TypeError: Unicode-objects must be encoded before hashing

i was able to fix the issue by adding .encode('utf-8') in line 424 of panops.py to look like:

genome_hash = hashlib.sha224('_'.join([''.join(split_names_of_interest), contigs_db.meta['contigs_db_hash']]).encode('utf-8')).hexdigest()[0:12]

You're software is awesome btw, thanks for being amazing. the figures are so pretty. Cheers,

meren commented 7 years ago

Hi rkevork,

Sorry about this :/ There are still some problems related to Python 3 switch. I have an idea about why @ozcan and I haven't been able to spot this encoding problem. Did you generate the contigs databases you are processing with anvi-gen-genomes-storage with a previous version of anvi'o?

Also, it looks like you are using MAC, is it correct? If yes, can you tell us whether it is Sierra or El Capitan?

Thank you very much for reporting this along with a solution!

rkevork commented 7 years ago

i had trouble with updating my profile.db from v 19 -> 20 to so i remade from scratch the contigs.db and profile.db from a single metagenome with the anvio-2.2.2 yesterday morning.

yup i'm on a macbook running Sierra.

On Fri, Mar 10, 2017 at 10:12 AM, A. Murat Eren notifications@github.com wrote:

Hi rkevork,

Sorry about this :/ There are still some problems related to Python 3 switch. I have an idea about why @ozcan https://github.com/ozcan and I haven't been able to spot this encoding problem. Did you generate the contigs databases you are processing with anvi-gen-genomes-storage with a previous version of anvi'o?

Also, it looks like you are using MAC, is it correct? If yes, can you tell us whether it is Sierra or El Capitan?

Thank you very much for reporting this along with a solution!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/merenlab/anvio/issues/473#issuecomment-285693679, or mute the thread https://github.com/notifications/unsubscribe-auth/AUHDC8LDWgFjnZtp3a2Eyxi3mDk6nrdGks5rkWhxgaJpZM4MZemo .

-- Richard Kevorkian Department of Microbiology University of Tennessee Knoxville, TN 37920 rtkevork@gmail.com

meren commented 7 years ago

Sorry about the upgrade problem :/ We solved it in #471.

So you are getting this error even when everything is done with v2.2.2. OK. We will look into this. Thank you.

meren commented 7 years ago

Note to self: This problem might be exclusive to internal genomes.