mikitex70 / plantuml-markdown

PlantUML plugin for Python-Markdown
BSD 2-Clause "Simplified" License
192 stars 55 forks source link

`source` parameter unable to handle files with unicode content #56

Closed 9ao9ai9ar closed 2 years ago

9ao9ai9ar commented 2 years ago

Environment

Windows 10 Python 3.9.6 mkdocs 1.2.2 plantuml-markdown 3.4.2

Test cases

test1.puml:

@startuml actor
' 各
actor me
rectangle box {
  me --> (item)
}
@enduml

test2.puml:

@startuml actor
actor 我
rectangle 方框 {
  我 --> (項目)
}
@enduml

Output

```plantuml source="test1.puml"
```
  File "C:\Users\9ao9ai9ar\miniconda3\envs\dev\lib\site-packages\plantuml_markdown.py", line 176, in _replace_block
    code += f.read()
  File "C:\Users\9ao9ai9ar\miniconda3\envs\dev\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 20: character maps to <undefined>
```plantuml source="test2.puml"
```
  File "C:\Users\9ao9ai9ar\miniconda3\envs\dev\lib\site-packages\plantuml_markdown.py", line 280, in _render_remote_uml_image
    return PlantUML("%s/%s/" % (self.config['server'], img_format)).processes(plantuml_code)
  File "C:\Users\9ao9ai9ar\miniconda3\envs\dev\lib\site-packages\plantuml.py", line 173, in processes
    raise PlantUMLHTTPError(response, content)
  File "C:\Users\9ao9ai9ar\miniconda3\envs\dev\lib\site-packages\plantuml.py", line 56, in __init__
    if not self.message:
AttributeError: 'PlantUMLHTTPError' object has no attribute 'message'

The test cases render fine if the code is inlined; this problem only manifests itself when using the source parameter.

mikitex70 commented 2 years ago

Hi @9ao9ai9ar, I've some difficulty to reproduce your issues as I've only Linux machines. What are the character encodings of the .md and the .puml files? From the first test it seems the the system is expecting a CP1252 encoded file (Windows default) but the contents cannot be decoded (UTF-8?). Can you attach to this issue a .puml file with your characters? Cut and pasting from the browser makes UTF-8 files...

9ao9ai9ar commented 2 years ago

Hi @mikitex70, Visual Studio Code indicates that the .md and .puml files are in UTF-8 encoding. I ran the following commands (using Miniconda on Windows) to create my test environment:

conda create -n test python
conda activate test
pip install mkdocs plantuml-markdown
mkdocs new test
cd test
# Move the test files inside docs folder and edit index.md to include the following lines (without the opening #'s):
# ```plantuml source="test1.puml"
# ```
mkdocs serve

I attach the whole MkDocs project folder here: test.zip.

Currently, I can use Markdown-Include to include .puml files without issues.

mikitex70 commented 2 years ago

I've release the version 3.4.3, give it a try. I've forced utf-8 encoding for files referenced by the source parameter; this should resolve your issue. A new configuration option (encoding) can be used to change the default, useful in Windows which has cp1252 as the default encoding.

9ao9ai9ar commented 2 years ago

I can confirm that this issue is fixed in version 3.4.3. Thank you for the prompt response and release of a fix.