Closed tw4l closed 7 years ago
Other useful details:
In most recent instance, this behavior happened with siegfried 1.6.7 -- default.sig (2016-11-22T20:59:52+11:00), identifiers: - pronom: DROID_SignatureFile_V88.xml; container-signature-20160927.xml in Ubuntu 16.04 LTS (Bitcurator ).
Maybe 6 months ago, I ran into the same issue with files from a different archive using an older version of Siegfried (not sure exactly which version) in OS X 10.9 Mavericks on a 2015 Macbook Pro.
thanks for this report Tim. The underlying issue here seems to be zip file name encoding. See this this blog post for background as to why zip really sucks in this respect: https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/ !
The concrete issue for you is that your example file has a really weird UTF8 name which SF is incorrectly detecting as Latin/ IBM437 (https://en.wikipedia.org/wiki/Code_page_437) and trying to decode in that way. That decoding process introduces the NUL value. The culprit code is: https://github.com/richardlehane/characterize/blob/master/zipname.go
I'll tidy up this function to improve encoding detection and possibly also introduce a printable character check as fail safe.
This fix will be in the next release which I was hoping would be this month but may be next :)
Hi Tim this should now be fixed in sf 1.7.0
Have run across this several times now. Have isolated an example and will send along by email.
The issue appears to relate to file names and character encoding in some way, at least as one possible cause.