openpreserve / nanite

Nanite - a friendly swarm of format-identifying robots.
openplanets.github.io/nanite/
15 stars 13 forks source link

Support direct Tika integration as a metadata enhancer #41

Open anjackson opened 2 years ago

anjackson commented 2 years ago

As per this tweet: https://twitter.com/_tallison/status/1501584655597850632?s=21 an alternative integration pattern is to register this properly as a Detector, but return null and add the results to the Metadata object instead. This puts the results where you can get them, but leaves Tika in charge of the ID-then-parse flow.

anjackson commented 1 year ago

This has been partially implemented, as PRONOM-related results have been added as Metadata, but it still returns the combined MIME type (as that's how I'm using it in webarchive-discovery for now).