metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
70 stars 34 forks source link

Add HTML input support #313

Closed fsteeg closed 4 years ago

fsteeg commented 4 years ago

Parse HTML with jsoup, write XML. See example in test.

See https://github.com/metafacture/metafacture-core/issues/312

Opening as draft pull request for some initial discussion, I suggest we only merge when we successfully used this in our full use case. In particular, config options and output format are to be determined.

@dr0i: I've set this up as a separate Gradle project, mostly because it adds the jsoup dependency, and because it fits with the overall structure that we have. What do you think?

fsteeg commented 4 years ago

With the (functionally reviewed) scenarios in https://github.com/hbz/oerindex/issues/2 and https://github.com/hbz/oerindex/issues/3, this could now resolve https://github.com/metafacture/metafacture-core/issues/312. It also contains https://github.com/metafacture/metafacture-core/pull/313/commits/47a5ba79ef72edd8dbc2e2fb58908b26359210ef, which resolves https://github.com/metafacture/metafacture-core/issues/314.