UnicodeError parsing input with md2po

dingyifei commented 3 years ago

I don't have permission to reopen #125, but I found some additional information that might indicate the underlying issue.

This test Action can run on any repo update_locales.yml.txt

name: Update Locales

on:
  workflow_dispatch:

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2
    - name: Install MDPO
      run: pip install mdpo
    - name: generate file
      run: touch a.md
    - name: Insert Text
      run: echo "helloworld" >> a.md
    - name: test1
      run: md2po ./a.md
    - name: test2
      run: python -c "import os;print(os.path.exists('a.md'));"
    - name: test3
      run: cat ./a.md | md2po

It produces the following result

Looks like the way md2po read files could be causing the issue.

I modified my script to do something as the following < $mdfilepath md2po --md-encoding utf-8 --po-encoding utf-8 -e utf-8 -w $width -q -s -c --po-filepath $pofilepath When I ran this script an encoding issue occurred.

I'm trying to use > filename to bypass the md2po read file issue, and it works for most of the files except a few ones.

Traceback (most recent call last):
  File "C:\Program Files\Python39\Scripts\md2po-script.py", line 33, in <module>
    sys.exit(load_entry_point('mdpo==0.3.19', 'console_scripts', 'md2po')())
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__main__.py", line 157, in main
    sys.exit(run(args=sys.argv[1:])[1])  # pragma: no cover
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__main__.py", line 148, in run
    pofile = markdown_to_pofile(opts.glob_or_content, **kwargs)
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__init__.py", line 624, in markdown_to_pofile
    return Md2Po(
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__init__.py", line 510, in extract
    _parse(self.content)
  File "c:\program files\python39\lib\site-packages\mdpo\md2po\__init__.py", line 500, in _parse
    parser.parse(
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc93' in position 1705: surrogates not allowed

mondeja commented 3 years ago

Thank you for the report @dingyifei, I'm reproducing it here and will try to investigate asap 👍🏼

For the encoding problem, could you upgrade your mdpo version? It seems that you are using mdpo==0.3.19.

dingyifei commented 3 years ago

The newer version changed a bit but is having the same error

mondeja commented 3 years ago

The md2po ./a.md error has been fixed in the new version v0.3.56 as you can see here.

dingyifei commented 3 years ago

Thank you for improving mdpo!

I think po2md also has a similar issue. Sorry for missing it earlier. It seems like the same issue just in a different place.

The following issue occurred in Action

Action yml:

name: Update Locales

on:
  workflow_dispatch:

jobs:
  build:

    runs-on: ubuntu-latest

    steps:
    - uses: actions/checkout@v2
    - name: Install MDPO
      run: pip install mdpo
    - name: generate file
      run: touch a.md
    - name: Insert Text
      run: echo "helloworld" >> a.md
    - name: test1
      run: md2po ./a.md
    - name: test2
      run: python -c "import os;print(os.path.exists('a.md'));"
    - name: test3
      run: cat ./a.md | md2po
    - name: test4
      run: cat ./a.md | md2po -s --po-filepath a.po && cat ./a.md | po2md -p a.po
    - name: test5
      run: cat ./a.md | md2po -s --po-filepath a.po && po2md a.md -p a.po

mondeja commented 3 years ago

I think po2md also has a similar issue...

Thank you for catching it. It's fixed in v0.3.57.

mondeja commented 3 years ago

I'm trying to use > filename to bypass the md2po read file issue, and it works for most of the files except a few ones. UnicodeEncodeError: 'utf-8' codec can't encode character '\udc93' in position 1705: surrogates not allowed

Hi @dingyifei, could you point what is the file that is raising this error?

dingyifei commented 3 years ago

More than one file has this error. One of them is

HallFilamentWidthSensor.po.txt input md: HallFilamentWidthSensor.md output md: HallFilamentWidthSensor.md

command:

cat $mdfilepath | po2md --md-encoding utf-8 --po-encoding utf-8 \
    -p $pofilepath -q -s $targetmdfile

I'm not able to reproduce it with > at this point (I messed up my script), but using cat filename | po2md does the trick.

Additionally, I observed some problem with old markdown outputs can't be correctly updated (so I added an rm command in my script). It doesn't affect rendering though. Should I create a new issue regarding this behavior?

mondeja commented 3 years ago

I can't reproduce it. The original error was produced due to an unencodable surrogate, which I think that should be considered an invalid input. Maybe, you can share the encoding that is using your terminal? Additionally, to be sure, what version of mdpo are you using?

Additionally, I observed some problem with old markdown outputs can't be correctly updated (so I added an rm command in my script). It doesn't affect rendering though. Should I create a new issue regarding this behavior?

Yes, why not? 👍🏼

dingyifei commented 3 years ago

Version: po2md 0.3.66 terminal settings: I believe it is running UTF-8 because the Chinese characters are being properly handled. I'll try to reproduce the markdown update inconsistency issue when I get enough time.

dingyifei commented 3 years ago

I'm not sure how, but it seems like the old markdown files are being updated correctly now. It's probably fixed in the link wrap fix or some other related fixes. I remember it was inserting extra empty lines (I think I had a screenshot somewhere) and ignoring wrap width on the first line of a paragraph.

dingyifei commented 3 years ago

Could it be the MingW64 used by Git Bash being so ridiculously different? This is a strange result

mondeja commented 3 years ago

I'm not able to reproduce in MinGW64. Maybe your version have that problem with PIPES. As this works everywhere except inside your version of MinGW64, I'm closing this.

Input

```bash mondeja@DESKTOP-U1FU7SV MINGW64 ~ $ cat HallFilamentWidthSensor.md | python3 "$(python3 -c "import mdpo;print(mdpo.__path__[0]);")\po2md\__main__.py" --md-encoding utf-8 --po-encoding utf-8 -p HallFilamentWidthSensor.po -qs HallFilamentWidthSensor.output.md ```

Result

```bash mondeja@DESKTOP-U1FU7SV MINGW64 ~ $ cat 'HallFilamentWidthSensor.output.md' This document describes Filament Width Sensor host module. Hardware used for developing this host module is based on Two Hall liniar sensors (ss49e for example). Sensors in the body are located opposite sides. Principle of operation : two hall sensors work in differential mode, temperature drift same for sensor. Special temperature compensation not needed. You can find designs at [thingiverse.com](https://www.thingiverse.com/thing:4138933) [Hall based filament width sensor assembly video](https://www.youtube.com/watch?v=TDO9tME8vp4) ## How does it work? Sensor generates two analog output based on calculated filament width. Sum of output voltage always equals to detected filament width . Host module monitors voltage changes and adjusts extrusion multiplier. I use aux2 connector on ramps-like board analog11 and analog12 pins. You can use different pins and differenr boards ## Configuration [hall_filament_width_sensor] adc1: analog11 adc2: analog12 # adc1 and adc2 channels select own pins Analog input pins on 3d printer board # Sensor power supply can be 3.3v or 5v Cal_dia1: 1.50 # Reference diameter point 1 (mm) Cal_dia2: 2.00 # Reference diameter point 2 (mm) # The measurement principle provides for two-point calibration # In calibration process you must use rods of known diameter # I use drill rods as the base diameter. # nominal filament diameter must be between Cal_dia1 and Cal_dia2 # Your size may differ from the indicated ones, for example 2.05 Raw_dia1:10630 # Raw sensor value for reference point 1 Raw_dia2:8300 # Raw sensor value for reference point 2 # Raw value of sensor in units # can be readed by command QUERY_RAW_FILAMENT_WIDTH default_nominal_filament_diameter: 1.75 # This parameter is in millimeters (mm) max_difference: 0.15 # Maximum allowed filament diameter difference in millimeters (mm) # If difference between nominal filament diameter and sensor output is more # than +- max_difference, extrusion multiplier set back to %100 measurement_delay: 70 # The distance from sensor to the melting chamber/hot-end in millimeters (mm). # The filament between the sensor and the hot-end will be treated as the default_nominal_filament_diameter. # Host module works with FIFO logic. It keeps each sensor value and position in # an array and POP them back in correct position. #enable:False # Sensor enabled or disabled after power on. Disabled by default # measurement_interval:10 # Sensor readings done with 10 mm intervals by default. If necessary you are free to change this setting #logging: False # Out diameter to terminal and klipper.log # can be turn on|of by command #Virtual filament_switch_sensor suppurt. Create sensor named hall_filament_width_sensor. # #min_diameter:1.0 #Minimal diameter for trigger virtual filament_switch_sensor. #use_current_dia_while_delay: False # Use the current diameter instead of the nominal diamenter while the measurement delay has not run through. # #Values from filament_switch_sensor. See the "filament_switch_sensor" section for information on these parameters. # #pause_on_runout: True #runout_gcode: #insert_gcode: #event_delay: 3.0 #pause_delay: 0.5 ## Commands **QUERY_FILAMENT_WIDTH** - Return the current measured filament width as result **RESET_FILAMENT_WIDTH_SENSOR** Ð²Ð‚â€œ Clear all sensor readings. Can be used after filament change. **DISABLE_FILAMENT_WIDTH_SENSOR** Ð²Ð‚â€œ Turn off the filament width sensor and stop using it to do flow control **ENABLE_FILAMENT_WIDTH_SENSOR** - Turn on the filament width sensor and start using it to do flow control **QUERY_RAW_FILAMENT_WIDTH** Return the current ADC channel values and RAW sensor value for calibration points **ENABLE_FILAMENT_WIDTH_LOG** - Turn on diameter logging **DISABLE_FILAMENT_WIDTH_LOG** - Turn off diameter logging ## Menu variables **hall_filament_width_sensor.Diameter** current measured filament width in mm **hall_filament_width_sensor.Raw** current raw measured filament width in units **hall_filament_width_sensor.is_active** Sensor on or off ## Template for menu variables [menu __main __filament __width_current] type: command enable: {'hall_filament_width_sensor' in printer} name: Dia: {'%.2F' % printer.hall_filament_width_sensor.Diameter} index: 0 [menu __main __filament __raw_width_current] type: command enable: {'hall_filament_width_sensor' in printer} name: Raw: {'%4.0F' % printer.hall_filament_width_sensor.Raw} index: 1 ## Calibration procedure Insert first calibration rod (1.5 mm size) get first raw sensor value To get raw sensor value you can use menu item or **QUERY_RAW_FILAMENT_WIDTH** command in terminal Insert second calibration rod (2.0 mm size) get second raw sensor value Save raw values in config ## How to enable sensor After power on by default sensor disabled. Enable sensor in start g-code by command **ENABLE_FILAMENT_WIDTH_SENSOR** or change enable parameter in config ## Logging After power on by default diameter Logging disabled. Data to log added on every measurement interval (10 mm by default) ```

mondeja / mdpo

UnicodeError parsing input with md2po #150