yuchenlin / rebiber

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
https://yuchenlin.xyz/
MIT License
2.63k stars 158 forks source link

fixing decoding issue #19

Closed tianyu-z closed 3 years ago

tianyu-z commented 3 years ago

I got these bugs today when I used your script:

Traceback (most recent call last):
  File "c:\users\tiany\appdata\local\programs\python\python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\tiany\appdata\local\programs\python\python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\tiany\AppData\Local\Programs\Python\Python37\Scripts\rebiber.exe\__main__.py", line 7, in <module>
  File "c:\users\tiany\appdata\local\programs\python\python37\lib\site-packages\rebiber\normalize.py", line 165, in main
    all_bib_entries = load_bib_file(args.input_bib)
  File "c:\users\tiany\appdata\local\programs\python\python37\lib\site-packages\rebiber\bib2json.py", line 22, in load_bib_file
    lines = f.readlines() + ["\n"]
UnicodeDecodeError: 'gbk' codec can't decode byte 0x9d in position 6526: illegal multibyte sequence

Traceback (most recent call last):
  File "c:\users\tiany\appdata\local\programs\python\python37\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\tiany\appdata\local\programs\python\python37\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\tiany\AppData\Local\Programs\Python\Python37\Scripts\rebiber.exe\__main__.py", line 7, in <module>
  File "c:\users\tiany\appdata\local\programs\python\python37\lib\site-packages\rebiber\normalize.py", line 172, in main
    normalize_bib(bib_db, all_bib_entries, output_path, args.deduplicate, removed_value_names, abbr_dict)
  File "c:\users\tiany\appdata\local\programs\python\python37\lib\site-packages\rebiber\normalize.py", line 110, in normalize_bib
    output_file.write(output_string)
UnicodeEncodeError: 'gbk' codec can't encode character '\u0107' in position 4023: illegal multibyte sequence

To fix them, I added encoding='utf8' in the IO process of both bib2json.py and normalize.py. It works well for me now.

yuchenlin commented 3 years ago

Thank you!