moinwiki / moin-1.9

MoinMoin Wiki (1.9, also: 1.5a ... 1.8), stable, for production wikis
https://moinmo.in/
Other
140 stars 51 forks source link

AttachFile move: unicode normalization troubles #59

Open ThomasWaldmann opened 4 years ago

ThomasWaldmann commented 4 years ago

Some characters can be encoded differently, e.g. the german umlaut ä can be:

# NFC normalization (composed):
>>> print("\xc3\xa4".decode('utf8'))
ä
# NFD normalization (decomposed):
>>> print("a\xcc\x88".decode('utf8'))
ä

So it is both an ä somehow, but different unicode method:

>>> "\xc3\xa4".decode('utf8')
u'\xe4'
>>> "a\xcc\x88".decode('utf8')
u'a\u0308'

Uploading an attachment with ä as a\xCC\x88 encoding makes it later impossible to rename:

File "/srv/moin-1.9/MoinMoin/action/AttachFile.py", line 301, in
move_attachment
     filesize = os.path.getsize(attachment_path)
File "/usr/lib/python2.7/genericpath.py", line 57, in getsize
     return os.stat(filename).st_size
OSError: [Errno 2] No such file or directory: '/srv/.../attachments/...\\xc3\\xa4....pdf'
ThomasWaldmann commented 4 years ago

https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize

ThomasWaldmann commented 4 years ago
>>> unicodedata.normalize('NFC', u'a\u0308')
u'\xe4'
ThomasWaldmann commented 4 years ago

https://stackoverflow.com/questions/18137554/how-to-convert-path-to-mac-os-x-path-the-almost-nfd-normal-form

ThomasWaldmann commented 4 years ago

https://pypi.org/project/nfd2nfc/ about the usual normalization forms on linux / macOS.

ThomasWaldmann commented 4 years ago

moin 1.9 code:

thus (moin running on linux):