smithy-lang / smithy-kotlin

Smithy code generator for Kotlin (in development)
Apache License 2.0
79 stars 26 forks source link

refactor: decrease generated artifact size #1057

Closed aajtodd closed 6 months ago

aajtodd commented 6 months ago

Issue \

see https://github.com/awslabs/aws-sdk-kotlin/issues/411

Description of changes

This PR attempts to decrease the generated artifact size of service clients by doing the following:

The changes and results are detailed in the sections below for each of these.

Inline higher order functions

You might consider this a bug since it was introduced with a refactor but in any case we have a lot of generated code in serializers and deserializers that looks something like:

internal class PutBucketLifecycleConfigurationOperationSerializer: HttpSerialize<PutBucketLifecycleConfigurationRequest> {
    override suspend fun serialize(context: ExecutionContext, input: PutBucketLifecycleConfigurationRequest): HttpRequestBuilder {
        val builder = HttpRequestBuilder()
        builder.method = HttpMethod.PUT

        builder.url {
            path.trailingSlash = true
            parameters.decodedParameters {
                add("lifecycle", "")
            }
        }

        builder.headers {
            if (input.checksumAlgorithm != null) append("x-amz-sdk-checksum-algorithm", input.checksumAlgorithm.value)
            if (input.expectedBucketOwner?.isNotEmpty() == true) append("x-amz-expected-bucket-owner", input.expectedBucketOwner)
        }

        if (input.lifecycleConfiguration != null) {
            val payload = serializeBucketLifecycleConfigurationPayloadWithXmlNameLifecycleConfiguration(input.lifecycleConfiguration)
            builder.body = HttpBody.fromBytes(payload)
        }
        if (builder.body !is HttpBody.Empty) {
            builder.headers.setMissing("Content-Type", "application/xml")
        }
        return builder
    }
}

All of the invocations like builder.url {...}, builder.headers {...}, parameters.decodedParameters{...}, etc take a lambda argument. This results in a lot of backing classes to hold the captured state (e.g. input) from the outer context.

main

> ls -lsa services/*/build/libs/*-jvm*.jar
4196 -rw-r--r-- 1 todaaron staff 3652000 Mar 20 09:06 services/dynamodb/build/libs/dynamodb-jvm-1.1.1-SNAPSHOT.jar
5768 -rw-r--r-- 1 todaaron staff 5083203 Mar 20 09:06 services/s3/build/libs/s3-jvm-1.1.1-SNAPSHOT.jar

> ls -lsa aws-runtime/aws-config/build/libs/*-jvm*.jar
1080 -rw-r--r-- 1 todaaron staff 1101995 Mar 20 09:05 aws-runtime/aws-config/build/libs/aws-config-jvm-1.1.1-SNAPSHOT.jar

with inlining

> ls -lsa services/*/build/libs/*-jvm*.jar
4448 -rw-r--r-- 1 todaaron staff 3601011 Mar 20 09:12 services/dynamodb/build/libs/dynamodb-jvm-1.1.1-SNAPSHOT.jar
4860 -rw-r--r-- 1 todaaron staff 4794421 Mar 20 09:13 services/s3/build/libs/s3-jvm-1.1.1-SNAPSHOT.jar

> ls -lsa aws-runtime/aws-config/build/libs/*-jvm*.jar
1072 -rw-r--r-- 1 todaaron staff 1096939 Mar 20 09:12 aws-runtime/aws-config/build/libs/aws-config-jvm-1.1.1-SNAPSHOT.jar
DELTA AFTER INLININING Artifact Delta %
Dynamodb -1.39%
S3 -5.68%
aws-config -0.46%

Remove most suspend points for generated HttpSerde

The only serializers and deserializers that suspend are the ones that deal with streaming types but we generate all operation serializers and deserializers as if they will suspend. Deserializers that just read the payload only suspend to pull the payload into memory to invoke the format (e.g. JSON, XML, etc) deserializer on it. This suspension point can be lifted into the runtime by providing separate interfaces for suspend and non.

> ls -lsa services/*/build/libs/*-jvm*.jar
3284 -rw-r--r-- 1 todaaron staff 3359574 Mar 20 11:53 services/dynamodb/build/libs/dynamodb-jvm-1.1.1-SNAPSHOT.jar
4740 -rw-r--r-- 1 todaaron staff 4490532 Mar 20 11:54 services/s3/build/libs/s3-jvm-1.1.1-SNAPSHOT.jar

> ls -lsa aws-runtime/aws-config/build/libs/*-jvm*.jar
1024 -rw-r--r-- 1 todaaron staff 1046552 Mar 20 11:52 aws-runtime/aws-config/build/libs/aws-config-jvm-1.1.1-SNAPSHOT.jar

DELTA FROM INLINING

Artifact Delta %
Dynamodb -6.70%
S3 -6.34%
aws-config -4.59%

Totals after inlining + http serde changes

Total delta with both inlining and HTTP serde changes compared to original (JVM) artifact sizes

Artifact Original Size Bytes After Size Bytes Delta %
Dynamodb 3652000 3359574 -8.34%
S3 5083203 4490532 -12.38%
aws-config 1101995 1046552 -5.16%

Appendix

The extracted artifacts before and after changes:

Latest S3 JVM jar:

> du -h                                                                                                                                                                                                 12:09:05 [1/17]
476K    ./endpoints/internal
596K    ./endpoints
 80K    ./paginators
140K    ./express
 48K    ./auth
 52K    ./internal
 60K    ./waiters
 52K    ./presigners
8.5M    ./model
6.7M    ./serde
 17M    .

After inlining + HTTP serde

> du -h
476K    ./endpoints/internal
596K    ./endpoints
 80K    ./paginators
140K    ./express
 48K    ./auth
 52K    ./internal
 60K    ./waiters
 36K    ./presigners
8.5M    ./model
4.9M    ./serde
 15M    .

For comparison with Java v2 SDK:

Java S3 latest:

s3-2.25.9.jar                                     2024-03-13 22:15   3572387      

Java DDB latest:

dynamodb-2.25.9.jar                               2024-03-13 22:17   2744634  

Next Steps


SdkSerializable

As noted in https://github.com/awslabs/aws-sdk-kotlin/issues/411#issuecomment-1011641463 the way we generate nested struct/union serialization causes backing classes to be generated to hold the required state. I looked for ways to remove this but none are easy/clean. The best solution here is to revisit serialization and make it format specific like we did for XML deserialization . This would remove quite a bit of size from artifacts I'd imagine as we have a lot of these in practice.

/**
 * Payload serializer for WebsiteConfiguration with a different XML name trait (WebsiteConfiguration)
 */
internal fun serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration(input: WebsiteConfiguration): ByteArray {
    val serializer = XmlSerializer()
    val ERRORDOCUMENT_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Struct, XmlSerialName("ErrorDocument"))
    val INDEXDOCUMENT_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Struct, XmlSerialName("IndexDocument"))
    val REDIRECTALLREQUESTSTO_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Struct, XmlSerialName("RedirectAllRequestsTo"))
    val ROUTINGRULES_DESCRIPTOR = SdkFieldDescriptor(SerialKind.List, XmlSerialName("RoutingRules"), XmlCollectionName("RoutingRule"))
    val OBJ_DESCRIPTOR = SdkObjectDescriptor.build {
        trait(XmlSerialName("WebsiteConfiguration"))
        trait(XmlNamespace("http://s3.amazonaws.com/doc/2006-03-01/"))
        field(ERRORDOCUMENT_DESCRIPTOR)
        field(INDEXDOCUMENT_DESCRIPTOR)
        field(REDIRECTALLREQUESTSTO_DESCRIPTOR)
        field(ROUTINGRULES_DESCRIPTOR)
    }

    serializer.serializeStruct(OBJ_DESCRIPTOR) {
        input.errorDocument?.let { field(ERRORDOCUMENT_DESCRIPTOR, it, ::serializeErrorDocumentDocument) }
        input.indexDocument?.let { field(INDEXDOCUMENT_DESCRIPTOR, it, ::serializeIndexDocumentDocument) }
        input.redirectAllRequestsTo?.let { field(REDIRECTALLREQUESTSTO_DESCRIPTOR, it, ::serializeRedirectAllRequestsToDocument) }
        if (input.routingRules != null) {
            listField(ROUTINGRULES_DESCRIPTOR) {
                for (el0 in input.routingRules) {
                    serializeSdkSerializable(asSdkSerializable(el0, ::serializeRoutingRuleDocument))
                }
            }
        }
    }
    return serializer.toByteArray()
}

All of the field(<DESCRIPTOR>, T, ::serializeFoo) calls and serializeSdkSerializable(...) calls generate an additional backing class.

> javap WebsiteConfigurationPayloadSerializerKt*

Compiled from "WebsiteConfigurationPayloadSerializer.kt"
final class aws.sdk.kotlin.services.s3.serde.WebsiteConfigurationPayloadSerializerKt$serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration$1$4$1 extends kotlin.jvm.internal.FunctionReferenceImpl implem
ents kotlin.jvm.functions.Function2<aws.smithy.kotlin.runtime.serde.Serializer, aws.sdk.kotlin.services.s3.model.RoutingRule, kotlin.Unit> {
  public static final aws.sdk.kotlin.services.s3.serde.WebsiteConfigurationPayloadSerializerKt$serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration$1$4$1 INSTANCE;
  aws.sdk.kotlin.services.s3.serde.WebsiteConfigurationPayloadSerializerKt$serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration$1$4$1();
  public final void invoke(aws.smithy.kotlin.runtime.serde.Serializer, aws.sdk.kotlin.services.s3.model.RoutingRule);
  public java.lang.Object invoke(java.lang.Object, java.lang.Object);
  static {};
}

Reduce operation error handling overhead

throwFooOperationError is a top level function that gets generated into a separate .class file. Class files have an overhead though so it may be smaller to just encode this into the operation error deserializer interface so they share the same class file OR for AWS protocols at least we could combine all operation handlers into a single function like throwS3Error(...). This should work because AWS protocols all have the type of the error in the response and so having lots of separate functions is unnecessary. They would behave the same if combined into one.

> javap PutBucketLifecycleConfigurationOperationDeserializerKt.class
Compiled from "PutBucketLifecycleConfigurationOperationDeserializer.kt"
public final class aws.sdk.kotlin.services.s3.serde.PutBucketLifecycleConfigurationOperationDeserializerKt {
  public static final java.lang.Void access$throwPutBucketLifecycleConfigurationError(aws.smithy.kotlin.runtime.operation.ExecutionContext, aws.smithy.kotlin.runtime.http.HttpCall, byte[]);
}

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.