openpreserve / jpylyzer

JP2 (JPEG 2000 Part 1) validator and properties extractor. Jpylyzer was specifically created to check that a JP2 file really conforms to the format's specifications. Additionally jpylyzer is able to extract technical characteristics.
http://jpylyzer.openpreservation.org/
Other
69 stars 28 forks source link

Surrogate pair in filename leads to UnicodeEncodeError on Windows, Python 3 #103

Closed bitsgalore closed 6 years ago

bitsgalore commented 6 years ago

Command:

python f:\johan\pythoncode\jpylyzer\jpylyzer\jpylyzer.py -w e:\jpylyzer-test-files\*.jp* > allScript.xml

Result:

Traceback (most recent call last):
  File "f:\johan\pythoncode\jpylyzer\jpylyzer\jpylyzer.py", line 413, in stripSu
rrogatePairs
    ustring.encode('utf-8')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udbd1' in position 3:
 surrogates not allowed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "f:\johan\pythoncode\jpylyzer\jpylyzer\jpylyzer.py", line 718, in <module
>
    main()
  File "f:\johan\pythoncode\jpylyzer\jpylyzer\jpylyzer.py", line 714, in main
    checkFiles(args.inputRecursiveFlag, args.inputWrapperFlag, jp2In)
  File "f:\johan\pythoncode\jpylyzer\jpylyzer\jpylyzer.py", line 633, in checkFi
les
    xmlElement = checkOneFile(path)
  File "f:\johan\pythoncode\jpylyzer\jpylyzer\jpylyzer.py", line 305, in checkOn
eFile
    fileNameCleaned = stripSurrogatePairs(fileName)
  File "f:\johan\pythoncode\jpylyzer\jpylyzer\jpylyzer.py", line 416, in stripSu
rrogatePairs
    tmp = ustring.encode('utf-8', 'surrogateescape')
UnicodeEncodeError: 'utf-8' codec can't encode character '\udbd1' in position 3:
 surrogates not allowed

This works without problems under Linux; Windows binaries work w/o problems as well (which is not surprising, as they're Python 2.7 based).

bitsgalore commented 6 years ago

Fixed now: https://github.com/openpreserve/jpylyzer/commit/a3d34455c6c1d63a5834deb125d49bd599d8dfa1