tseemann / prokka

:zap: :aquarius: Rapid prokaryotic genome annotation
831 stars 226 forks source link

gbk file is incompatible with Biopython #483

Open YiweiZhu opened 4 years ago

YiweiZhu commented 4 years ago

Hi! The gbk file made by prokka is incompatible with Biopython. When I load it into Biopython, it shows "UnicodeDecodeError: 'gbk' codec can't decode byte 0x88 in position 74: illegal multibyte sequence". I find this issue is related to the date format at the head of the file. In my case, the month is written in Chinese. Maybe it is the system language. Then, I translate it into English and clean spaces before the date, just leaving one space. Now it can be loaded into Biopython. Is tbl2asn the culprit to make this issue? My prokka version is 1.14.6, installed via conda.

Yiwei

peterjc commented 4 years ago

This will be an encoding problem somewhere, most likely the system encoding and the GenBank files are not matching. Are you interested in helping solve this? I can offer some input from the Biopython side.

e.g. Could you share the problem GenBank file, and details of your system's locale (i.e. what default encoding Python would use)?