smithy-lang / smithy-kotlin

Smithy code generator for Kotlin (in development)
Apache License 2.0
72 stars 26 forks source link

refactor XML deserialization #1042

Closed aajtodd closed 4 months ago

aajtodd commented 4 months ago

Issue \

aws-sdk-kotlin#1220

Description of changes

The context for this issue is in aws-sdk-kotlin#1220. Essentially we made a bad assumption that flat collections would always be serialized sequentially. In reality services are returning flat collections interspersed with other XML elements.

Our original approach to deserialization followed closely with kotlinx.serialization where we have a common Serializer and Deserializer interface. Each format we want to support (xml, json, etc) implements those and then codegen is the same across all types. The issue is (1) we end up duplicating information already in the model (field traits) and (2) we have to bend over backwards to make the format work within the interface instead of just creating a runtime type that more closely matches the medium. We discussed as a team our options for addressing this issue and decided to just refactor the way we do XML deserialization to closer match that of Go + Rust. This was something we had discussed prior to GA and just didn't have time to do. Rather than implement a one off workaround tailored specifically to this issue we're going to move in the desired end state which is to generate serializers/deserializers specific to each format (starting with just XML deserialization).

This is a large PR so I'm going to try and summarize the important bits for easier review. In particular because a lot of this PR is net new test code.

Codegen Output

For the S3 ListObjectVersions output type:

Previously

private fun deserializeListObjectVersionsOperationBody(builder: ListObjectVersionsResponse.Builder, payload: ByteArray) {
    val deserializer = XmlDeserializer(payload)
    val COMMONPREFIXES_DESCRIPTOR = SdkFieldDescriptor(SerialKind.List, XmlSerialName("CommonPrefixes"), Flattened)
    val DELETEMARKERS_DESCRIPTOR = SdkFieldDescriptor(SerialKind.List, XmlSerialName("DeleteMarker"), Flattened)
    val DELIMITER_DESCRIPTOR = SdkFieldDescriptor(SerialKind.String, XmlSerialName("Delimiter"))
    val ENCODINGTYPE_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Enum, XmlSerialName("EncodingType"))
    val ISTRUNCATED_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Boolean, XmlSerialName("IsTruncated"))
    val KEYMARKER_DESCRIPTOR = SdkFieldDescriptor(SerialKind.String, XmlSerialName("KeyMarker"))
    val MAXKEYS_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Integer, XmlSerialName("MaxKeys"))
    val NAME_DESCRIPTOR = SdkFieldDescriptor(SerialKind.String, XmlSerialName("Name"))
    val NEXTKEYMARKER_DESCRIPTOR = SdkFieldDescriptor(SerialKind.String, XmlSerialName("NextKeyMarker"))
    val NEXTVERSIONIDMARKER_DESCRIPTOR = SdkFieldDescriptor(SerialKind.String, XmlSerialName("NextVersionIdMarker"))
    val PREFIX_DESCRIPTOR = SdkFieldDescriptor(SerialKind.String, XmlSerialName("Prefix"))
    val VERSIONIDMARKER_DESCRIPTOR = SdkFieldDescriptor(SerialKind.String, XmlSerialName("VersionIdMarker"))
    val VERSIONS_DESCRIPTOR = SdkFieldDescriptor(SerialKind.List, XmlSerialName("Version"), Flattened)
    val OBJ_DESCRIPTOR = SdkObjectDescriptor.build {
        trait(XmlSerialName("ListVersionsResult"))
        trait(XmlNamespace("http://s3.amazonaws.com/doc/2006-03-01/"))
        field(COMMONPREFIXES_DESCRIPTOR)
        field(DELETEMARKERS_DESCRIPTOR)
        field(DELIMITER_DESCRIPTOR)
        field(ENCODINGTYPE_DESCRIPTOR)
        field(ISTRUNCATED_DESCRIPTOR)
        field(KEYMARKER_DESCRIPTOR)
        field(MAXKEYS_DESCRIPTOR)
        field(NAME_DESCRIPTOR)
        field(NEXTKEYMARKER_DESCRIPTOR)
        field(NEXTVERSIONIDMARKER_DESCRIPTOR)
        field(PREFIX_DESCRIPTOR)
        field(VERSIONIDMARKER_DESCRIPTOR)
        field(VERSIONS_DESCRIPTOR)
    }

    deserializer.deserializeStruct(OBJ_DESCRIPTOR) {
        loop@while (true) {
            when (findNextFieldIndex()) {
                COMMONPREFIXES_DESCRIPTOR.index -> builder.commonPrefixes =
                    deserializer.deserializeList(COMMONPREFIXES_DESCRIPTOR) {
                        val col0 = mutableListOf<CommonPrefix>()
                        while (hasNextElement()) {
                            val el0 = if (nextHasValue()) { deserializeCommonPrefixDocument(deserializer) } else { deserializeNull(); continue }
                            col0.add(el0)
                        }
                        col0
                    }
                DELETEMARKERS_DESCRIPTOR.index -> builder.deleteMarkers =
                    deserializer.deserializeList(DELETEMARKERS_DESCRIPTOR) {
                        val col0 = mutableListOf<DeleteMarkerEntry>()
                        while (hasNextElement()) {
                            val el0 = if (nextHasValue()) { deserializeDeleteMarkerEntryDocument(deserializer) } else { deserializeNull(); continue }
                            col0.add(el0)
                        }
                        col0
                    }
                DELIMITER_DESCRIPTOR.index -> builder.delimiter = deserializeString()
                ENCODINGTYPE_DESCRIPTOR.index -> builder.encodingType = deserializeString().let { EncodingType.fromValue(it) }
                ISTRUNCATED_DESCRIPTOR.index -> builder.isTruncated = deserializeBoolean()
                KEYMARKER_DESCRIPTOR.index -> builder.keyMarker = deserializeString()
                MAXKEYS_DESCRIPTOR.index -> builder.maxKeys = deserializeInt()
                NAME_DESCRIPTOR.index -> builder.name = deserializeString()
                NEXTKEYMARKER_DESCRIPTOR.index -> builder.nextKeyMarker = deserializeString()
                NEXTVERSIONIDMARKER_DESCRIPTOR.index -> builder.nextVersionIdMarker = deserializeString()
                PREFIX_DESCRIPTOR.index -> builder.prefix = deserializeString()
                VERSIONIDMARKER_DESCRIPTOR.index -> builder.versionIdMarker = deserializeString()
                VERSIONS_DESCRIPTOR.index -> builder.versions =
                    deserializer.deserializeList(VERSIONS_DESCRIPTOR) {
                        val col0 = mutableListOf<ObjectVersion>()
                        while (hasNextElement()) {
                            val el0 = if (nextHasValue()) { deserializeObjectVersionDocument(deserializer) } else { deserializeNull(); continue }
                            col0.add(el0)
                        }
                        col0
                    }
                null -> break@loop
                else -> skipValue()
            }
        }
    }
}

After:

private fun deserializeListObjectVersionsOperationBody(builder: ListObjectVersionsResponse.Builder, payload: ByteArray) {
    val root = xmlTagReader(payload)

    loop@while(true) {
        val curr = root.nextTag() ?: break@loop
        when(curr.tag.name) {
            // CommonPrefixes smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$CommonPrefixes
            "CommonPrefixes" -> builder.commonPrefixes = run {
                val el = deserializeCommonPrefixDocument(curr)
                createOrAppend(builder.commonPrefixes, el)
            }
            // DeleteMarkers smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$DeleteMarkers
            "DeleteMarker" -> builder.deleteMarkers = run {
                val el = deserializeDeleteMarkerEntryDocument(curr)
                createOrAppend(builder.deleteMarkers, el)
            }
            // Delimiter smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$Delimiter
            "Delimiter" -> builder.delimiter = curr.tryData()
                .getOrDeserializeErr { "expected (string: `com.amazonaws.s3#Delimiter`)" }
            // EncodingType smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$EncodingType
            "EncodingType" -> builder.encodingType = curr.tryData()
                .parse { EncodingType.fromValue(it) }
                .getOrDeserializeErr { "expected (enum: `com.amazonaws.s3#EncodingType`)" }
            // IsTruncated smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$IsTruncated
            "IsTruncated" -> builder.isTruncated = curr.tryData()
                .parseBoolean()
                .getOrDeserializeErr { "expected (boolean: `com.amazonaws.s3#IsTruncated`)" }
            // KeyMarker smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$KeyMarker
            "KeyMarker" -> builder.keyMarker = curr.tryData()
                .getOrDeserializeErr { "expected (string: `com.amazonaws.s3#KeyMarker`)" }
            // MaxKeys smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$MaxKeys
            "MaxKeys" -> builder.maxKeys = curr.tryData()
                .parseInt()
                .getOrDeserializeErr { "expected (integer: `com.amazonaws.s3#MaxKeys`)" }
            // Name smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$Name
            "Name" -> builder.name = curr.tryData()
                .getOrDeserializeErr { "expected (string: `com.amazonaws.s3#BucketName`)" }
            // NextKeyMarker smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$NextKeyMarker
            "NextKeyMarker" -> builder.nextKeyMarker = curr.tryData()
                .getOrDeserializeErr { "expected (string: `com.amazonaws.s3#NextKeyMarker`)" }
            // NextVersionIdMarker smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$NextVersionIdMarker
            "NextVersionIdMarker" -> builder.nextVersionIdMarker = curr.tryData()
                .getOrDeserializeErr { "expected (string: `com.amazonaws.s3#NextVersionIdMarker`)" }
            // Prefix smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$Prefix
            "Prefix" -> builder.prefix = curr.tryData()
                .getOrDeserializeErr { "expected (string: `com.amazonaws.s3#Prefix`)" }
            // VersionIdMarker smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$VersionIdMarker
            "VersionIdMarker" -> builder.versionIdMarker = curr.tryData()
                .getOrDeserializeErr { "expected (string: `com.amazonaws.s3#VersionIdMarker`)" }
            // Versions smithy.kotlin.synthetic.s3#ListObjectVersionsResponse$Versions
            "Version" -> builder.versions = run {
                val el = deserializeObjectVersionDocument(curr)
                createOrAppend(builder.versions, el)
            }
            else -> {}
        }
        curr.drop()
    }
}

Effect on Artifact Sizes

The 1.0.64 S3 release was 5,072,329 bytes. Local builds are coming in at 5,039,276 bytes (~0.6% smaller).

Benchmarks

I've updated the benchmarks. They are included inline here for easy review. The tl;dr is that the generated deserializers are adding less overhead to raw token lexing than before and as a result is faster.

jvm summary:
Benchmark                                                         (sourceFilename)  Mode  Cnt   Score   Error  Units
a.s.k.b.s.xml.XmlDeserializerBenchmark.deserializeBenchmark                    N/A  avgt    5  33.566 ± 0.074  ms/op
a.s.k.b.s.xml.XmlLexerBenchmark.deserializeBenchmark          countries-states.xml  avgt    5  25.200 ± 0.079  ms/op
a.s.k.b.s.xml.XmlLexerBenchmark.deserializeBenchmark            kotlin-article.xml  avgt    5   0.846 ± 0.003  ms/op
a.s.k.b.s.xml.XmlSerializerBenchmark.serializeBenchmark                        N/A  avgt    5  21.714 ± 0.385  ms/op

The lexer internals didn't change so they are nearly the same as the prior baseline. The deserialize benchmark came in at 33.566 ms/op compared to the prior 90.697 ms/op (62% faster).

Binary Compatibility

This change intentionally breaks binary compatibility on a few @InternalApi APIs:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sonarcloud[bot] commented 4 months ago

Quality Gate Failed Quality Gate failed

Failed conditions
3.6% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud