Closed dr0i closed 4 years ago
Here is the example provided by @hagbeck to show the XML should be valid:
In the browser this shows the result of their XSLT transformation, see 'view source' for the XML. However, I think this only works because their XML processor is lenient, because the XML actually declares itself as ?xml version="1.0"
, but arbitrary Unicode characters are only supported in XML 1.1, see https://www.w3.org/TR/xml11/#sec-xml11:
Finally, there is considerable demand to define a standard representation of arbitrary Unicode characters in XML documents. Therefore, XML 1.1 allows the use of character references to the control characters #x1 through #x1F, most of which are forbidden in XML 1.0.
(I tried to update Xalan in metafacture-biblio to 2.7.2 as suggested by @hagbeck, but that makes no difference – which I think makes sense since there is no XSLT involved in the sample Flux above).
So I see 3 possible approaches:
?xml version="1.0"
declaration in the XML to ?xml version="1.1"
(I suppose that's not so easy, @hagbeck? It also might have unexpected side effects on other users etc. Also, we should first verify that actually solves the issue)xmlVersion=1.1
), resulting in XML with the ?xml version="1.1"
declaration being passed to the Java XML parserI try to update the OCLC library to use 1.1
instead of 1.0
.
Made the OCLC library to output xml version 1.1
. That version was recognized by the AbstractSAXParser
where the Exception is thrown. But it didn't change a thing.
Stumbled about https://stackoverflow.com/questions/15634536/java-sax-parser-mangles-attributes-for-xml-1-1 where it is recommended to not use JDK XML parser but to switch to Apache Xerces XML parser.
This should be tried.
I don't understand this:
Made the OCLC library to output xml version 1.1. That version was recognized by the AbstractSAXParser where the Exception is thrown. But it didn't change a thing.
It throws the same exception although it's now getting XML 1.1?
Exactly.
Ah, nasty nasty, dependency hell ... Getting rid of xalan-serializer
, which is a dependency of xalan
, but not needed by the OAIPmh
, fixes it, see https://stackoverflow.com/questions/11952289/serializing-supplementary-unicode-characters-into-xml-documents-with-java . Will provide a proper config tomorrow.
It's like this:
if in build.gradle
the xalan:xalan:2.7.2
is implemented, I see in the used libraries in the IDE: xalan
, xml-apis
and serializer
. Run fails.
If only the xalan
library (shipped with oaiharvester)
is used, the xalan
appears in IDE, but no xml-apis
nor serializer
. Run succeeds. I.e., all data can be retrieved, but only when encoding is set to ISO-8859-1
, which breaks UTF8-characters (like mathematic symbols). I don't know how to cope with that properly.
As @fsteeg said your oaipmh-server serves xml version=1.0
. This seems to be used by the parsers of the oclc-oaipmh (and OaiPmhOpener wraps its own xml-header around the output . One could set xml version=1.1
but this is not used by the parsers, just as xml-header for the output.) I will could to manipulate the xml-source to check this, but maybe it would be even possible for you @hagbeck to set the xml-header to 1.1
?
We will check this.
But yesterday I discovered in an other context, that the current version of xmllint in Ubuntu 20.04 doesn't support the xml version=1.1
. It seems that this solution isn't stable enough for all use cases, isnt' it?
ACK. Also, I just let the OAIPmh ran as standalone. Surprise - it I works perfectly with your OAI-Server ! So it must be some library dependency and this should somehow be solvable.
@hagbeck try branch 334-fixEncoding, this should work. It basically uses some older libraries and excludes some others explicitly.
I've tried it using flux (open-oaipmh(metadataPrefix="mods", dateUntil="2020-10-23")
) and getting the following error. Changing date or metadataPrefix results in the same error.
Exception in thread "main" org.metafacture.commons.reflection.ReflectionException: class could not be instantiated: class org.metafacture.metamorph.Metamorph
at org.metafacture.commons.reflection.ConfigurableClass.newInstance(ConfigurableClass.java:105)
at org.metafacture.commons.reflection.ObjectFactory.newInstance(ObjectFactory.java:67)
at org.metafacture.flux.parser.FluxProgramm.createElement(FluxProgramm.java:70)
at org.metafacture.flux.parser.FluxProgramm.addElement(FluxProgramm.java:81)
at org.metafacture.flux.parser.FlowBuilder.pipe(FlowBuilder.java:736)
at org.metafacture.flux.parser.FlowBuilder.flowtail(FlowBuilder.java:514)
at org.metafacture.flux.parser.FlowBuilder.flow(FlowBuilder.java:226)
at org.metafacture.flux.parser.FlowBuilder.flux(FlowBuilder.java:122)
at org.metafacture.flux.FluxCompiler.compileFlow(FluxCompiler.java:54)
at org.metafacture.flux.FluxCompiler.compile(FluxCompiler.java:42)
at org.metafacture.runner.Flux.main(Flux.java:79)
Caused by: java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at org.metafacture.commons.reflection.ConfigurableClass.newInstance(ConfigurableClass.java:101)
... 10 more
Caused by: org.metafacture.metamorph.MetamorphException: Error while building the Metamorph transformation pipeline: javax.xml.transform.TransformerException: org.xml.sax.SAXException: org.w3c.dom.DOMException: NAMESPACE_ERR: Es wurde versucht, ein Objekt auf eine Weise zu erstellen oder zu ändern, die falsch in Bezug auf Namespaces ist.
org.w3c.dom.DOMException: NAMESPACE_ERR: Es wurde versucht, ein Objekt auf eine Weise zu erstellen oder zu ändern, die falsch in Bezug auf Namespaces ist.
at org.metafacture.metamorph.Metamorph.buildPipeline(Metamorph.java:191)
at org.metafacture.metamorph.Metamorph.<init>(Metamorph.java:179)
at org.metafacture.metamorph.Metamorph.<init>(Metamorph.java:126)
at org.metafacture.metamorph.Metamorph.<init>(Metamorph.java:116)
at org.metafacture.metamorph.Metamorph.<init>(Metamorph.java:112)
... 15 more
Caused by: org.metafacture.framework.MetafactureException: javax.xml.transform.TransformerException: org.xml.sax.SAXException: org.w3c.dom.DOMException: NAMESPACE_ERR: Es wurde versucht, ein Objekt auf eine Weise zu erstellen oder zu ändern, die falsch in Bezug auf Namespaces ist.
org.w3c.dom.DOMException: NAMESPACE_ERR: Es wurde versucht, ein Objekt auf eine Weise zu erstellen oder zu ändern, die falsch in Bezug auf Namespaces ist.
at org.metafacture.metamorph.xml.DomLoader.process(DomLoader.java:136)
at org.metafacture.metamorph.xml.DomLoader.parse(DomLoader.java:70)
at org.metafacture.metamorph.AbstractMetamorphDomWalker.walk(AbstractMetamorphDomWalker.java:108)
at org.metafacture.metamorph.AbstractMetamorphDomWalker.walk(AbstractMetamorphDomWalker.java:104)
at org.metafacture.metamorph.Metamorph.buildPipeline(Metamorph.java:187)
... 19 more
Caused by: javax.xml.transform.TransformerException: org.xml.sax.SAXException: org.w3c.dom.DOMException: NAMESPACE_ERR: Es wurde versucht, ein Objekt auf eine Weise zu erstellen oder zu ändern, die falsch in Bezug auf Namespaces ist.
org.w3c.dom.DOMException: NAMESPACE_ERR: Es wurde versucht, ein Objekt auf eine Weise zu erstellen oder zu ändern, die falsch in Bezug auf Namespaces ist.
at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:449)
at org.metafacture.metamorph.xml.DomLoader.process(DomLoader.java:134)
... 23 more
Caused by: org.xml.sax.SAXException: org.w3c.dom.DOMException: NAMESPACE_ERR: Es wurde versucht, ein Objekt auf eine Weise zu erstellen oder zu ändern, die falsch in Bezug auf Namespaces ist.
org.w3c.dom.DOMException: NAMESPACE_ERR: Es wurde versucht, ein Objekt auf eine Weise zu erstellen oder zu ändern, die falsch in Bezug auf Namespaces ist.
at org.apache.xml.utils.DOMBuilder.startElement(DOMBuilder.java:322)
at org.apache.xalan.transformer.TransformerIdentityImpl.startElement(TransformerIdentityImpl.java:1020)
at java.xml/org.xml.sax.helpers.XMLFilterImpl.startElement(XMLFilterImpl.java:551)
at java.xml/org.xml.sax.helpers.XMLFilterImpl.startElement(XMLFilterImpl.java:551)
at java.xml/org.xml.sax.helpers.XMLFilterImpl.startElement(XMLFilterImpl.java:551)
at java.xml/org.xml.sax.helpers.XMLFilterImpl.startElement(XMLFilterImpl.java:551)
at org.metafacture.metamorph.xml.LocationAnnotator.startElement(LocationAnnotator.java:80)
at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:510)
at java.xml/com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.startElement(XMLSchemaValidator.java:832)
at java.xml/com.sun.org.apache.xerces.internal.xinclude.XIncludeHandler.startElement(XIncludeHandler.java:1001)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:374)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDriver.scanRootElementHook(XMLNSDocumentScannerImpl.java:613)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3063)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:836)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
at java.xml/org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:357)
at java.xml/org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:357)
at java.xml/org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:357)
at org.metafacture.metamorph.xml.LexicalHandlerXmlFilter.parse(LexicalHandlerXmlFilter.java:51)
at java.xml/org.xml.sax.helpers.XMLFilterImpl.parse(XMLFilterImpl.java:357)
at org.metafacture.metamorph.xml.LexicalHandlerXmlFilter.parse(LexicalHandlerXmlFilter.java:51)
at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:432)
... 24 more
Caused by: org.w3c.dom.DOMException: NAMESPACE_ERR: Es wurde versucht, ein Objekt auf eine Weise zu erstellen oder zu ändern, die falsch in Bezug auf Namespaces ist.
at java.xml/com.sun.org.apache.xerces.internal.dom.AttrNSImpl.setName(AttrNSImpl.java:109)
at java.xml/com.sun.org.apache.xerces.internal.dom.AttrNSImpl.<init>(AttrNSImpl.java:78)
at java.xml/com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.createAttributeNS(CoreDocumentImpl.java:2140)
at java.xml/com.sun.org.apache.xerces.internal.dom.ElementImpl.setAttributeNS(ElementImpl.java:652)
at org.apache.xml.utils.DOMBuilder.startElement(DOMBuilder.java:307)
... 52 more
@hagbeck I updated the xalan-library. Can you try again please?
:+1: It work's fine now!
Resolved by https://github.com/metafacture/metafacture-core/pull/335. Closing.
Reported by @hagbeck :
results in:
SAXParseException; lineNumber: 194; columnNumber: 90; Zeichenreferenz "�" ist ein ungültiges XML-Zeichen