texworld / betterbib

:green_book: Command-line tools for bibliographies.
816 stars 42 forks source link

UnicodeDecodeError on Windows #248

Closed jimcarst closed 2 years ago

jimcarst commented 2 years ago

On Windows the following error occurs, (see here):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 39535: character maps to <undefined>

It can be fixed by specifying the encoding while using open() (see here.

Full log:

Traceback (most recent call last):
  File "*\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "*\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "*betterbib.exe\__main__.py", line 7, in <module>
  File "*\betterbib\cli\_main.py", line 67, in main
    return args.func(args)
  File "*\betterbib\cli\_update.py", line 26, in run
    d = sync(
  File "*\betterbib\sync.py", line 63, in sync
    journal_abbrev(d, long_journal_names)
  File "*\betterbib\journal_abbrev.py", line 16, in journal_abbrev
    table = json.load(f)
  File "*\lib\json\__init__.py", line 293, in load
    return loads(fp.read(),
  File "*\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 39535: character maps to <undefined>
jimcarst commented 2 years ago

To fix, change all uses of open to include encoding.

For example, in journal_abbrev.py: with open(this_dir / "data/journals.json") as f: becomes with open(this_dir / "data/journals.json", encoding='utf-8') as f:

EugeneGlushkov commented 2 years ago

Thanks, this was really helpful! Fixed the problem!

hdchieh commented 2 years ago

To fix, change all uses of open to include encoding.

For example, in journal_abbrev.py: with open(this_dir / "data/journals.json") as f: becomes with open(this_dir / "data/journals.json", encoding='utf-8') as f:

I have modified all uses of open, but still get:

File "D:\psnProgram\anaconda\lib\site-packages\betterbib\tools.py", line 543, in bibtex_parser
    data = bibtex.Parser().parse_file(infile)
  File "D:\psnProgram\anaconda\lib\site-packages\pybtex\database\input\__init__.py", line 56, in parse_file
    raise PybtexError(six.text_type(e), filename=self.filename)
pybtex.exceptions.PybtexError: 'gbk' codec can't decode byte 0xb3 in position 25183: illegal multibyte sequence
hdchieh commented 2 years ago

I have changed data = bibtex.Parser().parse_file(infile) to data = bibtex.Parser('utf-8').parse_file(infile), but still dont work.

nschloe commented 2 years ago

I would need a minimal bibtex file to reproduce the issue.

hdchieh commented 2 years ago

same as questions like this refs_better.bib.txt

can't upload bib files.

I would need a minimal bibtex file to reproduce the issue.

hdchieh commented 2 years ago

I would need a minimal bibtex file to reproduce the issue.

oh, it did work when I apply the betterbib to other bibliographies.

nschloe commented 2 years ago

The original issue is fixed now. If you have anymore problem, open a new report.