richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
217 stars 30 forks source link

An extra/unnecessary space chracter in PUID fmt/95 definition file #179

Closed ZdenekVasek closed 2 years ago

ZdenekVasek commented 2 years ago

When we tried to create the latest PRONOM signatures (v101) for Siegfried (with the roy tool), we encountered a problem with an extra/unnecessary space chracter in PUID fmt/95 definition file.

Simple procedure recapitulation:

wget https://cdn.nationalarchives.gov.uk/documents/DROID_SignatureFile_V101.xml

wget https://cdn.nationalarchives.gov.uk/documents/container-signature-20211216.xml

roy harvest

roy build

  2022/02/22 15:34:39 parse error fmt/95: Lex error in fmt/95: expecting a closing bracket, got ' '

grep '{ ' fmt95.xml

786D6C6E733A7064666169643D(22|27)687474703A2F2F7777772E6169696D2E6F72672F706466612F6E732F6964*7064666169643A636F6E666F726D616E6365(3E|3D22|3D27)41(22|27|3C2F7064666169643A636F6E666F726D616E63653E){ 0-120}7064666169643A70617274(3D22|3D27|3E)31(22|27|3C2F7064666169643A706172743E)

After its removal, everything works like a charm.

sed -i 's/{ /{/' fmt95.xml

roy build

We are not sure if it is caused by incorrect definition downloaded from the PRONOM / DROID site (if extra/unnecessary space character in definition is prohibited) or there is problem in roy utility (if extra/unnecessary space character in definition is allowed). Could you please check/analyze (or eventually fix) it ?

Dclipsham commented 2 years ago

Sorry about this - issue with PRONOM itself - we're aiming to put out a fix early-mid next week.

David

Dclipsham commented 2 years ago

This issue should now be resolved with v104 update which is now live and available to download. Please could you confirm if either way if the issue is now resolved for you?

ZdenekVasek commented 2 years ago

Yes, thank you. Problem solved.

Dclipsham commented 2 years ago

Fab, thank you @ZdenekVasek