richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
225 stars 30 forks source link

sf -update wikidata downloads Archivematica extensions as well #210

Closed ross-spencer closed 1 year ago

ross-spencer commented 1 year ago

I haven't explored why, but sf -update wikidata is downloading Archivematica extensions as well. This in and of itself is not particularly harmful, but does impede testing of the Wikidata sig.

spencer@rvarchivespencer2:/tmp$ sf -update wikidata
... downloading latest signature file ...
... writing /home/spencer/siegfried/default.sig ...
Your signature file has been updated
spencer@rvarchivespencer2:/tmp$ sf -version
siegfried 1.9.4
/home/spencer/siegfried/default.sig (2022-11-06T17:46:49+01:00)
identifiers:
  - archivematica: wikidata-definitions-3.0.0 (2022-11-06); extensions: archivematica-fmt2.xml, archivematica-fmt3.xml, archivematica-fmt4.xml, archivematica-fmt5.xml
ross-spencer commented 1 year ago

NB. connected to https://github.com/richardlehane/siegfried/pull/178 which should be the first place to look i.e. in gen.go.

richardlehane commented 1 year ago

this is the fault of my crappy config.go package: if you build signatures sequentially in the same session, previous config options can easily pollute later builds.

In this case the archivematica.sig is being built before the wikidata one and the extend config option is still set. There is a config.Clear() command that you can pass to clear earlier options.

Probably the cleanest way to fix this would be to add config.Clear() at the head of the wikidataOpts slice i.e.:

wikidataOpts := []config.Option{
  config.Clear(),
  config.SetWikidataNamespace(),
  config.SetWikidataNoPRONOM(),
}

config.Clear() doesn't currently wipe the extend option so that would need to be updated as well. config.Clear() is defined in /pkg/config/identifier.go and should be updated like:

func Clear() func() private {
    return func() private {
        identifier.name = ""
        identifier.extend = nil
                loc.fdd = ""
        mimeinfo.mi = ""
        return private{}
    }
}
richardlehane commented 1 year ago

fixed with v1.10.0