pdvrieze / xmlutil

XML Serialization library for Kotlin
https://pdvrieze.github.io/xmlutil/
Apache License 2.0
377 stars 30 forks source link

Namespaces on Attributes go missing? #167

Closed StefanOltmann closed 1 year ago

StefanOltmann commented 1 year ago

I try to figure out why the code I have at hand stops working if I switch from org.xml.sax to XmlUtils.

This is the code that is used to create the Document using SAX:

import com.ashampoo.xmp.XMPError
import com.ashampoo.xmp.XMPException
import nl.adaptivity.xmlutil.core.impl.multiplatform.StringReader
import nl.adaptivity.xmlutil.dom.Document
import org.xml.sax.InputSource
import javax.xml.parsers.DocumentBuilderFactory

object SaxDomParser : DomParser {

    /**
     * the DOM Parser Factory, options are set
     */
    private val factory = DocumentBuilderFactory.newInstance().apply {
        isNamespaceAware = true
    }

    override fun parseDocumentFromString(input: String): Document {

        try {

            val inputSource = InputSource(StringReader(input))

            val builder = factory.newDocumentBuilder()

            builder.setErrorHandler(null)

            return builder.parse(inputSource)

        } catch (ex: Exception) {
            throw XMPException("Error reading the XML-file", XMPError.BADSTREAM, ex)
        }
    }
}

This is the code that is used to create the Document using XmlUtils:

import com.ashampoo.xmp.XMPError
import com.ashampoo.xmp.XMPException
import nl.adaptivity.xmlutil.DomWriter
import nl.adaptivity.xmlutil.EventType
import nl.adaptivity.xmlutil.XmlStreaming
import nl.adaptivity.xmlutil.dom.Document
import nl.adaptivity.xmlutil.writeCurrent

object XmlUtilsDomParser : DomParser {

    override fun parseDocumentFromString(input: String): Document {

        try {

            val writer = DomWriter()

            val reader = XmlStreaming.newReader(input)

            do {
                val event = reader.next()
                reader.writeCurrent(writer)
            } while (event != EventType.END_DOCUMENT)

            return writer.target

        } catch (ex: Exception) {
            throw XMPException("Error reading the XML-file", XMPError.BADSTREAM, ex)
        }
    }
}

Everything else is identical by now.

So I figured that the differences must be in the produced Document.

I found this helpful code to write DOM Document back to a String representation:

import java.io.File
import java.io.StringWriter
import javax.xml.transform.TransformerFactory
import javax.xml.transform.dom.DOMSource
import javax.xml.transform.stream.StreamResult

fun getDocumentAsString(document: Document): String {
    val transformer = TransformerFactory.newInstance().newTransformer()
    val source = DOMSource(document)
    val writer = StringWriter()
    val result = StreamResult(writer)
    transformer.transform(source, result)
    return writer.toString()
}

This is the original: original_xmp.txt This is produced by the SAX parser: sample_1_sax.txt This is produced by XmlUtils: sample_1_xmlutil.txt

Between the original and the SAX version double quotes changed to single quotes and a additional header was added. Everything else is identical.

But in the XmlUtils version namespaces to attributes are lost.

In this sample rdf:about changed to about, rdf:parseType to parseType and xml:lang changed to lang.

sceeen

Is this a bug or intentional? Can this be configured?

pdvrieze commented 1 year ago

This was a nasty one. When writing attributes from events this would drop the prefix (but not the namespace). Combine that with a dom implementation that doesn't add the applicable prefix it breaks stuff. I've fixed it.

StefanOltmann commented 1 year ago

@pdvrieze Thank you for promptly investigating this issue. I truly appreciate your efforts. :)

I'm eager to try the fix, but it seems that the publish task was unsuccessful.

pdvrieze commented 1 year ago

@StefanOltmann It should now be fixed. Gets me for not running the full test suite first. The fixes exposed a but in the serialization code for attributes.

StefanOltmann commented 1 year ago

Thank you! :) It's fixed and works for JVM. I will now check the other platforms.

StefanOltmann commented 1 year ago

@pdvrieze It looks like that there is a very similar problem for the native/macOS implementation.

My code complains that about="" has no namespace.

pdvrieze commented 1 year ago

I've looked at things further. While that error is not in the system I found various edge cases with attribute handling. It will now correctly force a prefix for a qualified attribute (attributes without prefix always map to the null namespace). This pushed a lot of edge cases in the code (most people don't seem to use this stuff, or don't file bugs).

StefanOltmann commented 1 year ago

Thank you for investigating this. Personally I ignored things like that before, too. In my case it’s Adobes XMP Core library that seems to be quite strict about the input and requires namespaces and prefixes for everything.

Do you know when time is due for a new stable release?

StefanOltmann commented 1 year ago

I checked the current 0.86.1-SNAPSOT and on my side it looks all good now. πŸ‘

If you could publish a new stable version to Maven Central I would like to depend on that with my new lib. :)

pdvrieze commented 1 year ago

The release should be on its way

StefanOltmann commented 1 year ago

@pdvrieze Thank you so much! I conducted extensive tests on multiple XMP files in my test set across various platforms, and I'm happy to report that I didn't encounter any issues.

As a result, I have integrated your library into the recently released https://github.com/Ashampoo/xmpcore, which is now an integral part of my company's photo organization app, Ashampoo Photos.

I am truly grateful for your prompt fixes, as they enabled me to deliver this feature on schedule. πŸ‘