mondeja / mdpo

Markdown files translation using GNU PO files
https://mondeja.github.io/mdpo/
BSD 3-Clause "New" or "Revised" License
25 stars 5 forks source link

Encoding error using po2md and md2po #83

Closed dingyifei closed 3 years ago

dingyifei commented 3 years ago

System: Windows 10 Professional 19042.928 mdpo: md2po 0.3.12 Python: Python3.9 64bit Shell: Git bash System Language: Simplified Chinese (The shell has been set to zh-CN UTF-8)

First of all, thank you for developing this awesome program!

I have been using this program recently and when I tried to utilize it on the Klipper(A 3D printer firmware)'s documentation, it raises several errors. I believe some are possibly system language-related since it shows up warnings about GBK encoding. I ran this script in the https://github.com/KevinOConnor/klipper/tree/master/docs folder:

for file in *.md; do
  echo "Converting $file to ${file//.md/.po}"
  md2po $file -e utf8 -w 71 -q -s --po-filepath local/zh-hans/${file//.md/.po}
  echo "$Converting ${file//.md/.po} to $file"
  po2md $file --pofiles local/zh-hans/${file//.md/.po} -q -s local/zh-hans/README.md
done

The following are the output (unrelated things are removed):

 Contact.po to Contact.md
Traceback (most recent call last):
  File "c:\program files\python39\lib\runpy.py", line 197, in _run_module_as_mai
n
    return _run_code(code, main_globals, None,
  File "c:\program files\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python39\Scripts\po2md.exe\__main__.py", line 7, in <mo
dule>
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__main__.py", lin
e 84, in main
    sys.exit(run(args=sys.argv[1:])[1])  # pragma: no cover
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__main__.py", lin
e 72, in run
    output = pofile_to_markdown(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", lin
e 537, in pofile_to_markdown
    return Po2Md(pofiles, ignore=ignore, **kwargs).translate(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", lin
e 496, in translate
    parser.parse(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", lin
e 253, in enter_block
    '  ' * (len(self._ul_marks) - 1), self._ul_marks[-1],
IndexError: list index out of range

 Example_Configs.po to Example_Configs.md
Traceback (most recent call last):
  File "c:\program files\python39\lib\runpy.py", line 197, in _run_module_as_mai
n
    return _run_code(code, main_globals, None,
  File "c:\program files\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python39\Scripts\po2md.exe\__main__.py", line 7, in <mo
dule>
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__main__.py", lin
e 84, in main
    sys.exit(run(args=sys.argv[1:])[1])  # pragma: no cover
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__main__.py", lin
e 72, in run
    output = pofile_to_markdown(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", lin
e 537, in pofile_to_markdown
    return Po2Md(pofiles, ignore=ignore, **kwargs).translate(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", lin
e 496, in translate
    parser.parse(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", lin
e 253, in enter_block
    '  ' * (len(self._ul_marks) - 1), self._ul_marks[-1],
IndexError: list index out of range

Converting HallFilamentWidthSensor.md to HallFilamentWidthSensor.po
Traceback (most recent call last):
  File "c:\program files\python39\lib\runpy.py", line 197, in _run_module_as_mai
n
    return _run_code(code, main_globals, None,
  File "c:\program files\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python39\Scripts\md2po.exe\__main__.py", line 7, in <mo
dule>
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__main__.py", lin
e 161, in main
    sys.exit(run(args=sys.argv[1:])[1])  # pragma: no cover
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__main__.py", lin
e 152, in run
    pofile = markdown_to_pofile(opts.glob_or_content, **kwargs)
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__init__.py", lin
e 622, in markdown_to_pofile
    return Md2Po(
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__init__.py", lin
e 513, in extract
    content = f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x9c in position 3554: illegal
 multibyte sequence

 HallFilamentWidthSensor.po to HallFilamentWidthSensor.md
Traceback (most recent call last):
  File "c:\program files\python39\lib\runpy.py", line 197, in _run_module_as_mai
n
    return _run_code(code, main_globals, None,
  File "c:\program files\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python39\Scripts\po2md.exe\__main__.py", line 7, in <mo
dule>
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__main__.py", lin
e 84, in main
    sys.exit(run(args=sys.argv[1:])[1])  # pragma: no cover
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__main__.py", lin
e 72, in run
    output = pofile_to_markdown(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", lin
e 537, in pofile_to_markdown
    return Po2Md(pofiles, ignore=ignore, **kwargs).translate(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", lin
e 477, in translate
    content = to_file_content_if_is_file(filepath_or_content)
  File "c:\program files\python39\lib\site-packages\mdpo\io.py", line 58, in to_
file_content_if_is_file
    value = f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x9c in position 3554: illegal
 multibyte sequence

 Releases.po to Releases.md
Converting Resonance_Compensation.md to Resonance_Compensation.po
Traceback (most recent call last):
  File "c:\program files\python39\lib\runpy.py", line 197, in _run_module_as_mai
n
    return _run_code(code, main_globals, None,
  File "c:\program files\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python39\Scripts\md2po.exe\__main__.py", line 7, in <mo
dule>
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__main__.py", lin
e 161, in main
    sys.exit(run(args=sys.argv[1:])[1])  # pragma: no cover
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__main__.py", lin
e 152, in run
    pofile = markdown_to_pofile(opts.glob_or_content, **kwargs)
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__init__.py", lin
e 622, in markdown_to_pofile
    return Md2Po(
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__init__.py", lin
e 513, in extract
    content = f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x88 in position 4904: illegal
 multibyte sequence

 Resonance_Compensation.po to Resonance_Compensation.md
Traceback (most recent call last):
  File "c:\program files\python39\lib\runpy.py", line 197, in _run_module_as_mai
n
    return _run_code(code, main_globals, None,
  File "c:\program files\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Program Files\Python39\Scripts\po2md.exe\__main__.py", line 7, in <mo
dule>
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__main__.py", lin
e 84, in main
    sys.exit(run(args=sys.argv[1:])[1])  # pragma: no cover
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__main__.py", lin
e 72, in run
    output = pofile_to_markdown(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", lin
e 537, in pofile_to_markdown
    return Po2Md(pofiles, ignore=ignore, **kwargs).translate(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", lin
e 477, in translate
    content = to_file_content_if_is_file(filepath_or_content)
  File "c:\program files\python39\lib\site-packages\mdpo\io.py", line 58, in to_
file_content_if_is_file
    value = f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x88 in position 4904: illegal
 multibyte sequence

Thank you for reading this issue!

mondeja commented 3 years ago

Thank you for the detailed report :+1: I'll fix these errors as soon as possible.

mondeja commented 3 years ago

These errors should have been fixed in v0.3.15, but I haven't added tests for the encoding problem. Could you check if it works now using the new arguments --md-encoding and --po-encoding of po2md CLI?

dingyifei commented 3 years ago

The IndexError issue has been resolved, but I'm still getting the gbk encoding error even after I updated my script.

for file in *.md; do
  echo "Converting $file to ${file//.md/.po}"
  md2po $file -e utf-8 -w 71 -q -s --po-filepath local/zh-hans/${file//.md/.po}
  echo "$Converting ${file//.md/.po} to $file"
  po2md $file --md-encoding utf-8 --po-encoding utf-8 --pofiles local/zh-hans/${file//.md/.po} -q -s local/zh-hans/$file
done
Converting TSL1401CL_Filament_Width_Sensor.md to TSL1401CL_Filament_Width_Sensor.po
Traceback (most recent call last):
  File "C:\Program Files\Python39\Scripts\md2po-script.py", line 33, in <module>
    sys.exit(load_entry_point('mdpo==0.3.16', 'console_scripts', 'md2po')())
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__main__.py", line 152, in main
    sys.exit(run(args=sys.argv[1:])[1])  # pragma: no cover
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__main__.py", line 143, in run
    pofile = markdown_to_pofile(opts.glob_or_content, **kwargs)
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__init__.py", line 622, in markdown_to_pofile
    return Md2Po(
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__init__.py", line 513, in extract
    content = f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 1734: illegal multibyte sequence

This is the file I'm encountering error with

TSL1401CL_Filament_Width_Sensor.md

mondeja commented 3 years ago

Check v0.3.17, I've added encoding parameters also for md2po.

dingyifei commented 3 years ago

The md2po encoding issue seems to be resolved. There is still one error I'm encountering with po2md.

 HallFilamentWidthSensor.po to HallFilamentWidthSensor.md
Traceback (most recent call last):
  File "C:\Program Files\Python39\Scripts\po2md-script.py", line 33, in <module>
    sys.exit(load_entry_point('mdpo==0.3.18', 'console_scripts', 'po2md')())
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__main__.py", line 89, in main
    sys.exit(run(args=sys.argv[1:])[1])  # pragma: no cover
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__main__.py", line 75, in run
    output = pofile_to_markdown(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", line 549, in pofile_to_markdown
    return Po2Md(
  File "c:\program files\python39\lib\site-packages\mdpo\po2md\__init__.py", line 519, in translate
    f.write(self.output)
UnicodeEncodeError: 'gbk' codec can't encode character '\u0402' in position 3613: illegal multibyte sequence

HallFilamentWidthSensor.po.txt

mondeja commented 3 years ago

Ups, sorry :man_facepalming: I have fixed it in v0.3.19, check it please.

dingyifei commented 3 years ago

It's working now

mondeja commented 3 years ago

Thank you @dingyifei. If you find other errors, don't hesitate to open more issues :pray: