lifs-tools / pygoslin

Python implementation of parsers for the Grammars on succinct lipid nomenclature (Goslin).
https://github.com/lifs-tools/goslin
Other
3 stars 2 forks source link

Unicode Decode Errror on windows #11

Open PelzKo opened 2 years ago

PelzKo commented 2 years ago

On windows I have the following stacktrace when initializing Goslin (sorry for the formatting, it didnt manage to paste it and keep the new lines any other way):

Traceback (most recent call last): File "<frozen importlib._bootstrap>", line 1007, in _find_and_load File "<frozen importlib._bootstrap>", line 972, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed File "C:\Users\Konstantin\Programmierung\lipid-librarian\src\lipid_librarian\__init__.py", line 38, in <module> goslin_converter = goslin_init() File "C:\Users\Konstantin\Programmierung\lipid-librarian\src\lipid_librarian\__init__.py", line 27, in goslin_init return LipidParser() File "C:\Users\Konstantin\Programmierung\lipid-librarian\venv\lib\site-packages\pygoslin\parser\Parser.py", line 114, in __init__ self.parser_list = [ShorthandParser(), GoslinParser(), FattyAcidParser(), LipidMapsParser(), SwissLipidsParser(), HmdbParser()] File "C:\Users\Konstantin\Programmierung\lipid-librarian\venv\lib\site-packages\pygoslin\parser\Parser.py", line 79, in __init__ super().__init__(self.event_handler, file_name, Parser.DEFAULT_QUOTE) File "C:\Users\Konstantin\Programmierung\lipid-librarian\venv\lib\site-packages\pygoslin\parser\ParserCommon.py", line 132, in __init__ rules = Parser.extract_text_based_rules(grammar_filename, self.quote) File "C:\Users\Konstantin\Programmierung\lipid-librarian\venv\lib\site-packages\pygoslin\parser\ParserCommon.py", line 297, in extract_text_based_rules grammar = infile.read() + "\n"; File "C:\Users\Konstantin\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 9820: character maps to <undefined>

I think that can be fixed by editing this line (and all other lines which read files) and specifying the encoding to UTF-8 like here https://github.com/lifs-tools/pygoslin/blob/01c580cc1fd891c4be8825d5a24486da6b8aa7bd/pygoslin/parser/ParserCommon.py#L296

PelzKo commented 2 years ago

@dominik-kopczynski

nilshoffmann commented 2 years ago

Fix and PR are in progress #12

PelzKo commented 2 years ago

This issue is solved, thank you for your help! Could you release it onto PyPi?

chrispook commented 1 year ago

This issue is solved, thank you for your help! Could you release it onto PyPi?

I have installed PyGoslin 2.0.2 using pip install pygoslin and this bug is still present. I am running Anaconda and Python 3.9.7.