richardlehane / siegfried

signature-based file format identification
http://www.itforarchivists.com/siegfried
Apache License 2.0
217 stars 30 forks source link

Enable return of revision history via Wikibase (#3) #157

Closed ross-spencer closed 3 years ago

ross-spencer commented 3 years ago

This commit introduces a new version of Spargo called Wikiprov which is focused on returning revision history from Wikidata via the Wikidata API.

We also return a permalink for a Wikidata record which represents the status of the data at the time the signature was downloaded from the server.

The permalink for a record is returned with identification results. Inspect will return the history of a Wikidata/Wikibase record.

Wikibase instances should continue to be configurable though callers will need to specify a Wikidata query service endpoint as well as a URL for Wibibase permalinks to resolve to.

Testing has been increased to provide integration testing around this work making sure that compatible signature files are parsed correctly and return the correct identifications plus revision history and permalink.

Testing has also been increased to inspect JSON output from the identifier.


Adds a permalink to Siegfried results, and a more comprehensive output from Roy's inspect:

Permalink example

filename : 'testdata/wikidata/wd/Q28205479.info'
filesize : 8
modified : 2020-11-15T09:36:16-05:00
errors   : 
matches  :
  - ns       : 'wikidata'
    id       : 'Q28205479'
    format   : 'Amiga Workbench icon'
    URI      : 'http://www.wikidata.org/entity/Q28205479'
    wikibase : 'https://www.wikidata.org/w/index.php?format=json&oldid=533839428&title=Q28205479'
    mime     : 
    basis    : 'extension match info; byte match at 0, 8'
    source   : 'Gary Kessler''s File Signature Table (source date: 2017-08-08)'
    warning  : 

Inspect example

Format info: Name: 'Envoy'
MIMEType: 'application/x-envoy'
Sources: 'Just Solve the File Format Problem (source date: 2020-08-04) Just Solve the File Format Problem (source date: 2020-08-04)' 
Wikibase History: {
  "Title": "Q5381415",
  "Revision": 1343296571,
  "Modified": "2021-01-18T05:36:32Z",
  "Permalink": "https://www.wikidata.org/w/index.php?format=json&oldid=1343296571&title=Q5381415",
  "History": [
    "2021-01-18T05:36:32Z (oldid: 1343296571): 'Lockal' edited: '/* wbcreateclaim-create:1| */ [[Property:P646]]: /m/0fc557'",
    "2020-08-04T23:41:27Z (oldid: 1247209137): 'Beet keeper' edited: '/* wbsetclaim-update:2||1 */ [[Property:P4152]]: B297E169'",
    "2020-08-04T23:40:10Z (oldid: 1247208427): 'Beet keeper' edited: '/* wbsetclaim-update:2||1 */ [[Property:P4152]]: 325E1010'",
    "2020-02-21T14:40:33Z (oldid: 1120067133): 'YULdigitalpreservation' edited: '/* wbsetaliases-add:3|en */ Envoy Document File, Envoy Document, Envoy 1'",
    "2020-02-21T14:38:57Z (oldid: 1120066909): 'YULdigitalpreservation' edited: '/* wbsetclaim-create:2||1 */ [[Property:P348]]: 1'"
  ]
}
---
QID: (Q5381415)
globs: *.evy
sigs: (B:0 seq b297e169)
      (B:0 seq "2^\x10\x10")
superiors: none