soedinglab / hh-suite

Remote protein homology detection suite.
https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3019-7
GNU General Public License v3.0
538 stars 134 forks source link

hhsuitedb.py UnicodeDecodeERROR #158

Open eli1199 opened 5 years ago

eli1199 commented 5 years ago

Expected Behavior

Makes database with user supplied .a3m file(s)

Current Behavior

Erorr of: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 36: invalid start byte

Steps to Reproduce (for bugs)

python hhsuitedb.py --ia3m=og4320.a3m -o my_db --cpu=8 --force

HH-suite Output (for bugs)

Unlinking entries from '/tmp/tmprv97qam3/files.dat' Unlinking entries from '/tmp/tmprv97qam3/files.dat' Traceback (most recent call last): File "hhsuitedb.py", line 482, in main() File "hhsuitedb.py", line 478, in main check_database(options.output_basename, options.nr_cores, options.force_mode) File "hhsuitedb.py", line 376, in check_database calculate_hhm(threads, output_basename+"_a3m", output_basename+"_hhm") File "hhsuitedb.py", line 100, in calculate_hhm large_a3ms = get_large_a3ms(a3m_base_path) File "hhsuitedb.py", line 76, in get_large_a3ms entries = ffindex.read_index(a3m_base_path+".ffindex") File "ffindex.py", line 20, in read_index for line in fh: File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.0/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 36: invalid start byte

Your Environment

Include as many relevant details about the environment you experienced the issue in.

ApollineBruley commented 4 years ago

Hello ali1199,

I am having the same issue, did you find a way to solve this?

Thanks !

ApollineBruley commented 4 years ago

Hello ali1199,

I am having the same issue, did you find a way to solve this?

Thanks !

If anyone else is confronted to this issue : for me the problem came from the .a3m and .hhm file names. I shortened them and removed the '_' and '.' (I'm not sure what caused the problem exactly), and it works perfectly now!