pdvrieze / xmlutil

XML Serialization library for Kotlin
https://pdvrieze.github.io/xmlutil/
Apache License 2.0
363 stars 30 forks source link

How to simply parse/serialize a Map to/from XML that has keys as tag names wrapping text values. #192

Open gladapps opened 6 months ago

gladapps commented 6 months ago

I'm trying to simply parse a format like this:

<metadata>
  <fieldA>this is the text value for fieldA</fieldA>
  <fieldB>this is the text value for fieldB</fieldB>
  <fieldC>
    <fieldD>
      <fieldE>this is the text value for fieldE</fieldE>
    </fieldD>
  </fieldC>
</metadata>

into a Map<String, String> with the entry keys being the tag names:

mapOf(
  "fieldA" to "this is the text for fieldA",
  "fieldB" to "this is the text for fieldB",
  "fieldE" to "this is the text for fieldE"
)

I've made my own policy that overrides handleUnknownContentRecovering and just puts the keys and values into a single Map like this: dataMap[input.name.toCName()] = input.elementContentToFragment().contentString and returns the Map for elementIndex = 0. But I'm not sure what to do about the nested tags.

I also need to serialize such a Map.

I've tried using the existing MapEncoder, but I don't need keys and values to have their own tags. Maybe it can work if the entry name could use the key name and omit the key element, with the value collapsed. But I couldn't figure out how to get it to do that.

Any help would be greatly appreicated.

pdvrieze commented 6 months ago

The challenge is that this is not quite valid Xml. Tag names are intended to be well-defined. I would consider custom parsing the best solution (if you do this in a custom serializer you can still parse the values using serialization). However, there are other options:

gladapps commented 6 months ago

Thank you! I've actually already started implementing the dynamic tag names approach serializing a Map instead of a List. Serializing the Map without nesting worked straightaway, but I have not successfully parsed out values with MapEntrySerializer and DynamicTagReader. But, I'm coming to the realization that this approach is probably more work that it is worth, being that it is a minor part of the overall data model (the XML is the metadata of one object type in a vast sea of JSON). Being that our structure is not quite valid XML, I'm thinking maybe I should just revert back to doing dumb string building and parsing and use expect/actuals for the parts that don't have pure kotlin solutions (StringEscapeUtils.escapeXml11 in Apache Commons Text, for example).

pdvrieze commented 6 months ago

@gladapps You don't need to go to raw parsing with regexes or something. You can use the (separate) xml parsing support from the core library. You just create your parser, then read events, if it is a tag handle it (read the value (perhaps recursively), then add it to your list). It is serialization that doesn't like it (it makes too many assumptions), not the xml parser. However, parsing a list of Nodes "should" work (but it doesn't due to a bug).

susrisha commented 5 months ago

@pdvrieze Is there an example of creating own parser with read events and tag handles? Would love to have a look at it.

pdvrieze commented 5 months ago

@susrisha The way to go is to use the object XmlStreaming (I will be transitioning to an accessor function xmlStreaming due to changes in multiplatform expect/actual). This object allows you to create instances of parsers/serializers in a platform independent way (the "generic" variants create platform independent variants so you would have consistent behaviour, those can also be created directly as KtXmlReader and KtXmlWriter). Then you can use next to get the next event and nextTag to get the next tag event (note that the latter will verify there is no ignorable content in between). You can retrieve the current event as eventType and depending on the event type you can retrieve name, attributes, text content etc. Have a look at the documentation: https://pdvrieze.github.io/xmlutil/core/nl.adaptivity.xmlutil/-xml-reader/index.html .

Note that you don't need the serialization part of the library for this, only core.