redundent / kotlin-xml-builder

A lightweight type safe builder to build xml documents in Kotlin
Apache License 2.0
151 stars 17 forks source link

Unnecessary string encoding, need option to disable encoding. #45

Closed bajabob closed 2 years ago

bajabob commented 2 years ago

Hello,

I stumbled across this wonderful project while building out a sitemap. As you may know, a sitemap is typically constructed in XML and is of the (basic) form:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
             xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
    <url>
      <loc>http://example.com/sample.html</loc>
      <image:image>
        <image:loc>http://example.com/image.jpg</image:loc>
      </image:image>
    </url>
</urlset>

I have found no issues using this framework to generate this file, except one. When placing my image url's I am seeing string encoding that is not needed.

Original string (example, not actually working):

https://storage.googleapis.com/download/storage/v1/b/my-server-dev.appspot.com/o/rec%2FwPxDElgqMEHC5TfiNzSr%2FrbbVvqxzCA8NQyp_SM_SQ.jpg?generation=1626614089842042&alt=media

Output with Kotlin XML Builder:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>https://mysite.com/rec/wPxDElgqMEHC5TfiNzSr/user/WpfoEr5JrZvqVOKAYD6c</loc>
        <lastmod>2021-07-18</lastmod>
        <image:image>
            <image:loc>https://storage.googleapis.com/download/storage/v1/b/my-server-dev.appspot.com/o/rec%2FwPxDElgqMEHC5TfiNzSr%2FrbbVvqxzCA8NQyp_SM_SQ.jpg?generation=1626614089842042&#38;alt=media</image:loc>
        </image:image>
    </url>
    <url>
        <loc>https://mysite.com/rec/MZarzTZDokrwjnYhyaOF/user/WpfoEr5JrZvqVOKAYD6c</loc>
        <lastmod>2021-07-16</lastmod>
    </url>
</urlset>

Using this print option:

sitemap.toString(
    PrintOptions(
        singleLineTextElements = true,
        useCharacterReference = false
    )
)

Adds &#38; to the URL

And using this print option:

sitemap.toString(
    PrintOptions(
        singleLineTextElements = true,
        useCharacterReference = true
    )
)

Adds &amp; to the url.

Neither of which are valid, breaking the url from being able to work properly. I have also attempted to employ workarounds trying URL Encoding/Decoding before priming the XML Builder. Ultimately the URL needs to be represented in its original form, unchanged. If this is something that already exists in the framework, I have missed it. Please let me know your thoughts.

Regards,

Bob

redundent commented 2 years ago

HI Bob, I believe you can achieve a similar output to your PR changes by just wrapping your image url inside the existing cdata function. Depending on how your are adding the <image:loc> element, you should be able to do something like this:

"image:loc" {
  cdata("https://unencodedurl.org?var1=test&var2=othertest")
}

That will wrap the tag in a CDATA which will stop the encoding.

Let me know if that works for you.

-Jason

bajabob commented 2 years ago

That will work @redundent ! Thanks. Closing this.