kieranjol / IFIscripts

Detailed documentation is available here: http://ifiscripts.readthedocs.io/en/latest/index.html
http://ifiscripts.readthedocs.io/en/latest/index.html
MIT License
50 stars 34 forks source link

ififuncs/validate - adds better accent normalistion in manifests and … #372

Closed kieranjol closed 4 years ago

kieranjol commented 4 years ago

…scripts

This uses unicodedata to handlke fadas/irish and other non-english accents in filenames. it will use either NFC or NFD depending on the contxt - NFC for manifests, but NFD for checking if a file exists https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize

kieranjol commented 4 years ago

This can be merged when:

kieranjol commented 4 years ago

On windows 10, current behaviour is: blaááá.xml runs fine with copyit and validate, but the checksum manifest says blacccc.xml and the encoding shows up as ISO 8859-5 in notepad++. Everything should be normalised to utf-8 if possible.

kieranjol commented 4 years ago

Tests carried out - Folder containing six files, one contains a combined diacritic including these two example lines:

00d6da5aeee9e47f860bfc20a2d6f37f  Replacement STL Files - Copy (2)/-CL5-TEILEATEACS.stl
6c3b26563725526671c13e973111fc6e  Replacement STL Files - Copy (2)/-CL6-TEILEATêACS.stl

TO be done - sipcreator fails due to mediainfo issue.

kieranjol commented 4 years ago

sipcreator now works when the filename with combined diacritics appears in the source folders. This is with the -sc arg.

kieranjol commented 4 years ago

and this also works with batchaccession.py when the input mov has combined diacritics. All should be well I think.