Use latin1 for text encoding?

martonmiklos commented 5 years ago

Hi folks!

First of all thanks for all efforts put into this project!

I have some schematics where accented characters were present in the texts and got some exceptions:

Traceback (most recent call last):
  File "altium.py", line 1615, in <module>
    main()
  File "altium.py", line 420, in main
    render(args.file, renderer.Renderer)
  File "altium.py", line 590, in __init__
    self.handle_children([objects])
  File "altium.py", line 627, in handle_children
    handler(self, owners, obj)
  File "altium.py", line 996, in handle_text_frame
    text=obj["TEXT"].decode("utf-8").replace("~1", "\n"),
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 5: invalid start byte

The problematic text was the following:

b'1x5 t\xfcskesor~190\xb0, 1,27mm' Which corresponds to:

1x5 tüskesor\n90°, 1,27mm

I will do some experiments to map all the accented and special characters, but I am under an impression that Altium uses latin1 character encoding rather than plain ASCII.

vadmium commented 5 years ago

I expect it uses something like Latin-1 or Windows-1252. I am happy to change line 996 to decode with Latin-1. However I noted under https://github.com/vadmium/python-altium/blob/master/format.md#pin that I saw the byte 0x8E representing a broken bar (U+00A6, ¦). So the full story might not be so simple.

I have come across parallel UTF-8 properties, for instance as well as one named TEXT, there is one named %UTF8%TEXT. You don’t know if your text frame object has a UTF-8 version of the text?

martonmiklos commented 5 years ago

Hi @vadmium

I have not found any occurrence of the "UTF" string in the file.

I think I will create a text with including the most accents, and special characters, save it and see the text to make more solid conclusion on the encoding type.

vadmium / python-altium

Use latin1 for text encoding? #10