metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
71 stars 34 forks source link

Make DTD loading in XmlDecoder optional #236

Open thomasseidel opened 9 years ago

thomasseidel commented 9 years ago

Currently, the XmlDecoder loads referenced DTDs and fails on broken links. It would be useful when automatic DTD loading becomes a configurable XmlDecoder option.

liowalter commented 8 years ago

I encountered this problem as well ! Thanks.

guenterh commented 8 years ago

@liowalter Hi Lionel,

I played a little bit around and can now provide a solution which works but has not the status it should have in the end (I think)

You can find the code in [1]

I had to copy and paste the complete implementation of MF core [2] because the type is declared as final.

As you can see in [1] it's really simple. Now Entities aren't handled at all - the code returns an empty string which is obviously the same as you do it right now with sed where you remove the DTD prolog.

My Idea:

I will send you the jar you have to put into the plugins directory of your repository. Then you have to start the flux script with the property -Dflux.pluginsdir=[absolute path to the plugins dir] Might be that this is done by the MF-Runner repository if it is deployed correctly (I'm not sure)

Use the new Flux command generic-xml-handle-dtd ("article") | //"article" is the record delimiter //handle-generic-xml ("article") | //"article" is the record delimiter in your Flux script

Sorry - unfortunately I haven't worked with MF since some weeks. Then I always need some time to get into it... But I want to be more steadily in the future!

Günter

[1] https://github.com/guenterh/nlmfCommands/blob/master/src/main/java/org/swissbib/mf/stream/converter/xml/GenericXmlDTDHandler.java#L107 [2] https://github.com/culturegraph/metafacture-core/blob/master/src/main/java/org/culturegraph/mf/stream/converter/xml/GenericXmlHandler.java#L39

zazi commented 8 years ago

Currently, the XmlDecoder loads referenced DTDs and fails on broken links. @thomasseidel can you probably show me the code, where this handling is done? - from my knowledge, the resolveEntity method implementation in DefaultXmlPipe returns null, see https://github.com/culturegraph/metafacture-core/blob/master/src/main/java/org/culturegraph/mf/framework/DefaultXmlPipe.java#L114