metafacture / metafacture-core

Core package of the Metafacture tool suite for metadata processing.
https://metafacture.org
Apache License 2.0
69 stars 34 forks source link

FEATURE_SECURE_PROCESSING threshold too low in XmlDecoder #554

Open dr0i opened 1 month ago

dr0i commented 1 month ago

Got:

<=<=Exception in thread "main" org.metafacture.framework.MetafactureException: org.xml.sax.SAXParseException; lineNumber: 8496675; columnNumber: 3876; JAXP00010004: Die akkumulierte Größe von Entitys ist "50.000.001" und überschreitet den Grenzwert "50.000.000", der von "FEATURE_SECURE_PROCESSING" festgelegt wurde. at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:79) at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:44) at org.metafacture.io.FileOpener.process(FileOpener.java:158) at org.metafacture.io.FileOpener.process(FileOpener.java:41) at org.metafacture.flux.parser.StringSender.process(StringSender.java:43) at org.metafacture.flux.parser.Flow.start(Flow.java:118) at org.metafacture.flux.parser.FluxProgramm.start(FluxProgramm.java:168) at org.metafacture.runner.Flux.main(Flux.java:87) Caused by: org.xml.sax.SAXParseException; lineNumber: 8496675; columnNumber: 3876; JAXP00010004: Die akkumulierte Größe von Entitys ist "50.000.001" und überschreitet den Grenzwert "50.000.000", der von "FEATURE_SECURE_PROCESSING" festgelegt wurde. at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1243) at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635) at org.metafacture.xml.XmlDecoder.process(XmlDecoder.java:73)

when analyzing Alma Basedump in a single process.

blackwinter commented 1 month ago

How is this a bug in Metafacture? You can adjust the limit if necessary:

https://github.com/hbz/limetrans/blob/c9bde02b7e42680f923232ab24f3134344e0df1a/src/main/java/hbz/limetrans/Limetrans.java#L284-L287

dr0i commented 1 month ago

Thx @blackwinter - I have hoped there is a setting :) So, using the runner we change its build.gradle to: ~applicationDefaultJvmArgs = ["-agentlib:hprof=heap=sites,cpu=samples,depth=${depth},cutoff=${cutoff},file=${file}.hprof.txt -Dinvokejdk.xml.totalEntitySizeLimit=0"]~ (EDIT dr0i: this is not working. Try JAVA_TOOL_OPTIONS=-Djdk.xml.totalEntitySizeLimit=0 ./gradlew :metafix-runner:run --args="$pathToFlux" )?

May we think about: a) making this default b) introducing a parameter for XmlDecoder

blackwinter commented 1 month ago

Changing the JVM args in the Metafix runner is effectively your option a), right? That's a :-1: from me. Since this is a security-related setting, it should be the user's decision to relax any limits.

We might want to introduce a setter for a more targeted approach - your option b) - instead of requiring to set the limit globally. But that's just a convenience feature, isn't it?

dr0i commented 1 month ago

But that's just a convenience feature, isn't it?

It's definitely convenient, but not only that: it also make the restrictions of XmlDecoder more overt to users so that they may circumvent this restriction even before they ran one such big ETL erroring.

blackwinter commented 1 month ago

Okay, no objection to making this limit more discoverable.