zach-m / jonix

Commercial-grade library for extracting data from ONIX sources
Apache License 2.0
57 stars 17 forks source link

CDATA section content not parsed #32

Closed rriedeltsi closed 1 year ago

rriedeltsi commented 1 year ago

Content from CDATA sections is not available from the respective Jonix element.

For instance, a SenderName encoded as follows

<x298><![CDATA[Irgendein Verlag]]></x298>

produces an empty string when calling ...header().senderName

Same behaviour for Text <d104 textformat="06" language="ger"> <![CDATA[<section class="pim-producttext-additional_1"><h2>...]]> </d104>

and RecordReference.

zach-m commented 1 year ago

It works fine in my tests. Here is where we treat CDATA nodes: https://github.com/zach-m/jonix/blob/master/jonix-common/src/main/java/com/tectonica/jonix/common/JPU.java#L112

This could be an XML Java-library issue specific to your setup. Please provide as much information as possible and I'll try to reproduce.

rriedeltsi commented 1 year ago

Thanks @zach-m for your quick reply, and the hint provided! Currently, my setup is a little outdated - the code runs on Wildfly23 and OpenJDK11, and I' using 2023-05-onix308 I'll investigate on that further

rriedeltsi commented 1 year ago

Hi Zach, I can confirm the issue is related to my setup. With the same code, running on OpenJDK17 and Quarkus 3.0.2 as runtime, everything works fine.

zach-m commented 1 year ago

Thanks for the feedback. In fact, all the problems I've seen with XML parsing occurred on JDK version 11. Something must have been broken at JDK11 and then fixed in JDK12.