pdvrieze / xmlutil

XML Serialization library for Kotlin
https://pdvrieze.github.io/xmlutil/
Apache License 2.0
379 stars 30 forks source link

Unable to parse attributes if root has namespace and attributes don't #236

Open Bluexin opened 2 months ago

Bluexin commented 2 months ago

I am facing an issue, which I'm not sure whether it is caused by a bug/missing options or by misconfiguration or other user error.

I need to deserialize XML files which root tag has a namespace defined, something like this :

<test:HelloWorld xmlns:test="https://test.local">
    <user>You!</user>
</test:HelloWorld>

Do note that this namespace can be defined as either http (legacy) or https.

I am reading it with code inspired by the readme :

fun main() {
    val result = HelloWorld::class.java.classLoader.getResourceAsStream("test.xml")?.use {
        XML {
            defaultPolicy {
                pedantic = false
                isStrictAttributeNames = false
            }
        }.decodeFromReader(HelloWorld.serializer(), xmlStreaming.newReader(it.reader()))
    } ?: error("Couldn't read file")
    println(result)
}

@Serializable
@SerialName("test:HelloWorld")
@XmlNamespaceDeclSpec("test=https://test.local")
data class HelloWorld(
    @XmlElement
    val user: String
)

The problem is, this snippet fails with the following error :

Exception in thread "main" nl.adaptivity.xmlutil.serialization.UnknownXmlFieldException: Could not find a field for name (test:HelloWorld) {https://test.local}HelloWorld/user (Element)
  candidates: {https://test.local}user (Element) at position 2:11
    at nl.adaptivity.xmlutil.serialization.XmlConfig.DEFAULT_UNKNOWN_CHILD_HANDLER$lambda$6(XmlConfig.kt:469)
    at nl.adaptivity.xmlutil.serialization.DefaultXmlSerializationPolicy.handleUnknownContentRecovering(XmlSerializationPolicy.kt:633)
    at nl.adaptivity.xmlutil.serialization.XmlDecoderBase$TagDecoderBase.indexOf(XMLDecoder.kt:1007)
    at nl.adaptivity.xmlutil.serialization.XmlDecoderBase$TagDecoderBase.decodeElementIndex(XMLDecoder.kt:1197)
    at HelloWorld$$serializer.deserialize(Main.kt:20)
    at HelloWorld$$serializer.deserialize(Main.kt:20)
    at nl.adaptivity.xmlutil.serialization.XmlDecoderBase.deserializeSafe(XMLDecoder.kt:80)
    at nl.adaptivity.xmlutil.serialization.XmlDecoderBase.deserializeSafe$default(XMLDecoder.kt:74)
    at nl.adaptivity.xmlutil.serialization.XmlDecoderBase$XmlDecoder.decodeSerializableValue(XMLDecoder.kt:267)
    at nl.adaptivity.xmlutil.serialization.XML.decodeFromReader(XML.kt:443)
    at nl.adaptivity.xmlutil.serialization.XML.decodeFromReader$default(XML.kt:410)
    at MainKt.main(Main.kt:15)
    at MainKt.main(Main.kt)

As you can see in the above example, I have tried disabling both pedantic and strict attribute names settings. I have also tried using KtXmlReader(..., relaxed = true), and with either KtXmlReader and StAXReader with the same results.

Removing these annotations lead to the same error when parsing. These are set this way to achieve serialization into the shared XML format.

@SerialName("test:HelloWorld")
@XmlNamespaceDeclSpec("test=https://test.local")

I tried with this annotations instead :

@XmlSerialName(
    value = "HelloWorld",
    namespace = "https://test.local",
    prefix = "test",
)

However this causes the serialized xml to have namespaced tags (which is not in my source XML). It also behaves funny when using it without specifying value :

<test:https://test.local xmlns:test="HelloWorld">
    <test:user>World</test:user>
</test:https://test.local>

The snippet works as expected with 0.86.2 (regardless of reader or settings) but stopped doing so in 0.86.3. I have tried 0.90.1 and 0.90.2-beta1 resulting in the same error.

If I am doing something wrong, should be using some kind of filter or anything else, please do advise. I have added the build script, main class and test xml in a gist for convenience.

pdvrieze commented 2 months ago

A number of points:

Bluexin commented 2 months ago

Thanks for the fast response !

You are expected to set the value of an XmlSerialName annotation. The behaviour in its absence is a bit surprising, but may be caused by some other annotations (there is no check that the name is actually a valid cname).

Makes sense, however right now it has a default value which would lead one to think it might not be required

There is support for custom root tag names (+namespaces), and the default behaviour for member tags is for namespaces to inherit from the owner tag.

I could not quite find how to do this. In particular, I did not find out how to have the root tag namespaced, but not the member tags (without specifying @XmlSerialName on every single member field). Is there a configuration I'm missing for this ?

Namespaces just happen to use URLs but are not actually resolved. As such http is perfectly valid, and https is an entirely different namespace.

I know they aren't, however I would like to host the schema at the given location, in which case I prefer it to be serialized with https. And to be able to load either of them.

@XmlNamespaceDeclSpec is only relevant when writing/serializing, and it just ensures that specific namespace is declared on the root tag.

Indeed, this bit seems to work fine -- however it doesn't seem to work.

If when writing you don't want the namespace to be visible, use prefix="" which puts it in the default prefix. However, if you want to put user in the default namespace, you just use @XmlSerialName("user","","") @XmlElement val user:String

The problem here might be that I want the namespace on the root tag (ie test:HelloWorld) but not on the member tags (so user, not test:user). Specifying @XmlSerialName with empty ns on every single member is not ideal in my case because of the hundreds of different members I would need to annotate in a hierarchy so if I can avoid this via configuration or something that would be great.

If the custom root tag thing doesn't work (note that this may require you not to specify the serial name on class HelloWorld, or to specify it explicitly), you can use a filtering xml reader to rewrite the namespace.

How would I do this ? By implementing XmlDelegatingReader and overriding the methods to get tags ?


I do still find unfortunate that this behaviour changed in such a breaking way between 0.86.2 and 0.86.3 -- and am not sure how unusual my usecase (namespaced root, no namespace on the tags) is but it would seem fairly common to me ? If it is, I can update the new version of the schema accordingly and load the legacy files via this possible filtering option.

pdvrieze commented 2 months ago

Thanks for the fast response !

You are expected to set the value of an XmlSerialName annotation. The behaviour in its absence is a bit surprising, but may be caused by some other annotations (there is no check that the name is actually a valid cname).

Makes sense, however right now it has a default value which would lead one to think it might not be required

The "default" behaviour is to use the name that is the SerialName for the context (annotated or derived), so it sort of works, but in the case of types it doesn't do the same omission of packages that is used when not specifying the annotation at all. I consider that a bug I'll fix. It should work, but doesn't quite (also before kotlin 2.0 default annotation parameters didn't work reliably).

There is support for custom root tag names (+namespaces), and the default behaviour for member tags is for namespaces to inherit from the owner tag.

I could not quite find how to do this. In particular, I did not find out how to have the root tag namespaced, but not the member tags (without specifying @XmlSerialName on every single member field). Is there a configuration I'm missing for this ?

There is a strong assumption that members live in the same namespace as the containing tag (this is also how xml schemas work). As such if you want to use different namespaces for those members you will have them specify that different namespace (either on declaration or on use). However, this is how the default policy works (you can use/override the policy to implement your own behaviour).

Namespaces just happen to use URLs but are not actually resolved. As such http is perfectly valid, and https is an entirely different namespace.

I know they aren't, however I would like to host the schema at the given location, in which case I prefer it to be serialized with https. And to be able to load either of them.

I understand that. I may add some default "normalizer" at some point (you can do this by hand).

@XmlNamespaceDeclSpec is only relevant when writing/serializing, and it just ensures that specific namespace is declared on the root tag.

Indeed, this bit seems to work fine -- however it doesn't seem to work.

It is important to note that it doesn't put the tag in the given namespace. It is only intended to provide prefix/namespace mapping.

If when writing you don't want the namespace to be visible, use prefix="" which puts it in the default prefix. However, if you want to put user in the default namespace, you just use @XmlSerialName("user","","") @xmlelement val user:String

The problem here might be that I want the namespace on the root tag (ie test:HelloWorld) but not on the member tags (so user, not test:user). Specifying @XmlSerialName with empty ns on every single member is not ideal in my case because of the hundreds of different members I would need to annotate in a hierarchy so if I can avoid this via configuration or something that would be great.

If it are hundreds, then it is probably best to override the naming policy (make a subtype of DefaultXmlSerializationPolidy)

If the custom root tag thing doesn't work (note that this may require you not to specify the serial name on class HelloWorld, or to specify it explicitly), you can use a filtering xml reader to rewrite the namespace.

How would I do this ? By implementing XmlDelegatingReader and overriding the methods to get tags ?

Yes, that is how you do it.

I do still find unfortunate that this behaviour changed in such a breaking way between 0.86.2 and 0.86.3 -- and am not sure how unusual my usecase (namespaced root, no namespace on the tags) is but it would seem fairly common to me ? If it is, I can update the new version of the schema accordingly and load the legacy files via this possible filtering option.

Your usage, to omit namespaces on member tags (attributes default to the default "empty" namespace, even if the default prefix mapping is to a different namespace) is not typical. It is not in line with the way schemas work.

I'm not quite sure what behaviour change you refer to, but it was probably in response to actually incorrect naming/parsing.

In your case the best approach is probably the best approach to override the policy that determines the tag names used (and not have it inherit the namespace of the containing tag).

Bluexin commented 2 months ago

Thanks, I will try that out in the future when I have more time (will stay on 0.86.2 for now)