weblicht / profiler

A java library able to profile (i.e. determine the mediatype, language etc.) of an arbitrary file.
Other
2 stars 3 forks source link

Create extension mechanism for recognizing new formats #1

Open emanueldima opened 4 years ago

emanueldima commented 4 years ago

We need a mechanism to describe custom format variants, its main usage being for the TEI subformats. The mediatype, root element, schemas (either relaxng, schematro, xml schema or dtd) must be testable with a common Turing complete language for full power. A new profile should require just adding a new file with a lambda function taking the list of features and returning a new format (for a match) or empty result (for non match).