Closed kieranjol closed 4 years ago
This can be merged when:
On windows 10, current behaviour is:
blaááá.xml
runs fine with copyit and validate, but the checksum manifest says blacccc.xml
and the encoding shows up as ISO 8859-5 in notepad++. Everything should be normalised to utf-8 if possible.
Tests carried out - Folder containing six files, one contains a combined diacritic including these two example lines:
00d6da5aeee9e47f860bfc20a2d6f37f Replacement STL Files - Copy (2)/-CL5-TEILEATEACS.stl
6c3b26563725526671c13e973111fc6e Replacement STL Files - Copy (2)/-CL6-TEILEATêACS.stl
TO be done - sipcreator fails due to mediainfo issue.
sipcreator now works when the filename with combined diacritics appears in the source folders. This is with the -sc arg.
and this also works with batchaccession.py when the input mov has combined diacritics. All should be well I think.
…scripts
This uses
unicodedata
to handlke fadas/irish and other non-english accents in filenames. it will use either NFC or NFD depending on the contxt - NFC for manifests, but NFD for checking if a file exists https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize