This PR attempts to decrease the generated artifact size of service clients by doing the following:
Inline several higher order function calls that are used fairly heavily in generated serialization code
Remove suspend from most HTTP operation serializers and deserializers
The changes and results are detailed in the sections below for each of these.
Inline higher order functions
You might consider this a bug since it was introduced with a refactor but in any case we have a lot of generated code
in serializers and deserializers that looks something like:
internal class PutBucketLifecycleConfigurationOperationSerializer: HttpSerialize<PutBucketLifecycleConfigurationRequest> {
override suspend fun serialize(context: ExecutionContext, input: PutBucketLifecycleConfigurationRequest): HttpRequestBuilder {
val builder = HttpRequestBuilder()
builder.method = HttpMethod.PUT
builder.url {
path.trailingSlash = true
parameters.decodedParameters {
add("lifecycle", "")
}
}
builder.headers {
if (input.checksumAlgorithm != null) append("x-amz-sdk-checksum-algorithm", input.checksumAlgorithm.value)
if (input.expectedBucketOwner?.isNotEmpty() == true) append("x-amz-expected-bucket-owner", input.expectedBucketOwner)
}
if (input.lifecycleConfiguration != null) {
val payload = serializeBucketLifecycleConfigurationPayloadWithXmlNameLifecycleConfiguration(input.lifecycleConfiguration)
builder.body = HttpBody.fromBytes(payload)
}
if (builder.body !is HttpBody.Empty) {
builder.headers.setMissing("Content-Type", "application/xml")
}
return builder
}
}
All of the invocations like builder.url {...}, builder.headers {...}, parameters.decodedParameters{...}, etc take
a lambda argument. This results in a lot of backing classes to hold the captured state (e.g. input) from the outer context.
main
> ls -lsa services/*/build/libs/*-jvm*.jar
4196 -rw-r--r-- 1 todaaron staff 3652000 Mar 20 09:06 services/dynamodb/build/libs/dynamodb-jvm-1.1.1-SNAPSHOT.jar
5768 -rw-r--r-- 1 todaaron staff 5083203 Mar 20 09:06 services/s3/build/libs/s3-jvm-1.1.1-SNAPSHOT.jar
> ls -lsa aws-runtime/aws-config/build/libs/*-jvm*.jar
1080 -rw-r--r-- 1 todaaron staff 1101995 Mar 20 09:05 aws-runtime/aws-config/build/libs/aws-config-jvm-1.1.1-SNAPSHOT.jar
with inlining
> ls -lsa services/*/build/libs/*-jvm*.jar
4448 -rw-r--r-- 1 todaaron staff 3601011 Mar 20 09:12 services/dynamodb/build/libs/dynamodb-jvm-1.1.1-SNAPSHOT.jar
4860 -rw-r--r-- 1 todaaron staff 4794421 Mar 20 09:13 services/s3/build/libs/s3-jvm-1.1.1-SNAPSHOT.jar
> ls -lsa aws-runtime/aws-config/build/libs/*-jvm*.jar
1072 -rw-r--r-- 1 todaaron staff 1096939 Mar 20 09:12 aws-runtime/aws-config/build/libs/aws-config-jvm-1.1.1-SNAPSHOT.jar
DELTA AFTER INLININING
Artifact
Delta %
Dynamodb
-1.39%
S3
-5.68%
aws-config
-0.46%
Remove most suspend points for generated HttpSerde
The only serializers and deserializers that suspend are the ones that deal with streaming types but we generate all operation serializers and deserializers as if they will suspend. Deserializers that just read the payload only suspend to pull the payload into memory to invoke the format (e.g. JSON, XML, etc) deserializer on it. This suspension point can be lifted into the runtime by providing separate interfaces for suspend and non.
> ls -lsa services/*/build/libs/*-jvm*.jar
3284 -rw-r--r-- 1 todaaron staff 3359574 Mar 20 11:53 services/dynamodb/build/libs/dynamodb-jvm-1.1.1-SNAPSHOT.jar
4740 -rw-r--r-- 1 todaaron staff 4490532 Mar 20 11:54 services/s3/build/libs/s3-jvm-1.1.1-SNAPSHOT.jar
> ls -lsa aws-runtime/aws-config/build/libs/*-jvm*.jar
1024 -rw-r--r-- 1 todaaron staff 1046552 Mar 20 11:52 aws-runtime/aws-config/build/libs/aws-config-jvm-1.1.1-SNAPSHOT.jar
DELTA FROM INLINING
Artifact
Delta %
Dynamodb
-6.70%
S3
-6.34%
aws-config
-4.59%
Totals after inlining + http serde changes
Total delta with both inlining and HTTP serde changes compared to original (JVM) artifact sizes
As noted in https://github.com/awslabs/aws-sdk-kotlin/issues/411#issuecomment-1011641463 the way we generate nested struct/union serialization causes backing classes to be generated to hold the required state. I looked for ways to remove this but none are easy/clean. The best solution here is to revisit serialization and make it format specific like we did for XML deserialization . This would remove quite a bit of size from artifacts I'd imagine as we have a lot of these in practice.
/**
* Payload serializer for WebsiteConfiguration with a different XML name trait (WebsiteConfiguration)
*/
internal fun serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration(input: WebsiteConfiguration): ByteArray {
val serializer = XmlSerializer()
val ERRORDOCUMENT_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Struct, XmlSerialName("ErrorDocument"))
val INDEXDOCUMENT_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Struct, XmlSerialName("IndexDocument"))
val REDIRECTALLREQUESTSTO_DESCRIPTOR = SdkFieldDescriptor(SerialKind.Struct, XmlSerialName("RedirectAllRequestsTo"))
val ROUTINGRULES_DESCRIPTOR = SdkFieldDescriptor(SerialKind.List, XmlSerialName("RoutingRules"), XmlCollectionName("RoutingRule"))
val OBJ_DESCRIPTOR = SdkObjectDescriptor.build {
trait(XmlSerialName("WebsiteConfiguration"))
trait(XmlNamespace("http://s3.amazonaws.com/doc/2006-03-01/"))
field(ERRORDOCUMENT_DESCRIPTOR)
field(INDEXDOCUMENT_DESCRIPTOR)
field(REDIRECTALLREQUESTSTO_DESCRIPTOR)
field(ROUTINGRULES_DESCRIPTOR)
}
serializer.serializeStruct(OBJ_DESCRIPTOR) {
input.errorDocument?.let { field(ERRORDOCUMENT_DESCRIPTOR, it, ::serializeErrorDocumentDocument) }
input.indexDocument?.let { field(INDEXDOCUMENT_DESCRIPTOR, it, ::serializeIndexDocumentDocument) }
input.redirectAllRequestsTo?.let { field(REDIRECTALLREQUESTSTO_DESCRIPTOR, it, ::serializeRedirectAllRequestsToDocument) }
if (input.routingRules != null) {
listField(ROUTINGRULES_DESCRIPTOR) {
for (el0 in input.routingRules) {
serializeSdkSerializable(asSdkSerializable(el0, ::serializeRoutingRuleDocument))
}
}
}
}
return serializer.toByteArray()
}
All of the field(<DESCRIPTOR>, T, ::serializeFoo) calls and serializeSdkSerializable(...) calls generate an additional backing class.
> javap WebsiteConfigurationPayloadSerializerKt*
Compiled from "WebsiteConfigurationPayloadSerializer.kt"
final class aws.sdk.kotlin.services.s3.serde.WebsiteConfigurationPayloadSerializerKt$serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration$1$4$1 extends kotlin.jvm.internal.FunctionReferenceImpl implem
ents kotlin.jvm.functions.Function2<aws.smithy.kotlin.runtime.serde.Serializer, aws.sdk.kotlin.services.s3.model.RoutingRule, kotlin.Unit> {
public static final aws.sdk.kotlin.services.s3.serde.WebsiteConfigurationPayloadSerializerKt$serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration$1$4$1 INSTANCE;
aws.sdk.kotlin.services.s3.serde.WebsiteConfigurationPayloadSerializerKt$serializeWebsiteConfigurationPayloadWithXmlNameWebsiteConfiguration$1$4$1();
public final void invoke(aws.smithy.kotlin.runtime.serde.Serializer, aws.sdk.kotlin.services.s3.model.RoutingRule);
public java.lang.Object invoke(java.lang.Object, java.lang.Object);
static {};
}
Reduce operation error handling overhead
throwFooOperationError is a top level function that gets generated into a separate .class file. Class files
have an overhead though so it may be smaller to just encode this into the operation error deserializer interface so they share the same class file OR for AWS protocols at least we could combine all operation handlers into a single function like throwS3Error(...). This should work because AWS protocols all have the type of the error in the response and so having lots of separate functions is unnecessary. They would behave the same if combined into one.
> javap PutBucketLifecycleConfigurationOperationDeserializerKt.class
Compiled from "PutBucketLifecycleConfigurationOperationDeserializer.kt"
public final class aws.sdk.kotlin.services.s3.serde.PutBucketLifecycleConfigurationOperationDeserializerKt {
public static final java.lang.Void access$throwPutBucketLifecycleConfigurationError(aws.smithy.kotlin.runtime.operation.ExecutionContext, aws.smithy.kotlin.runtime.http.HttpCall, byte[]);
}
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Issue \
see https://github.com/awslabs/aws-sdk-kotlin/issues/411
Description of changes
This PR attempts to decrease the generated artifact size of service clients by doing the following:
suspend
from most HTTP operation serializers and deserializersThe changes and results are detailed in the sections below for each of these.
Inline higher order functions
You might consider this a bug since it was introduced with a refactor but in any case we have a lot of generated code in serializers and deserializers that looks something like:
All of the invocations like
builder.url {...}
,builder.headers {...}
,parameters.decodedParameters{...}
, etc take a lambda argument. This results in a lot of backing classes to hold the captured state (e.g.input
) from the outer context.main
with inlining
Remove most suspend points for generated HttpSerde
The only serializers and deserializers that suspend are the ones that deal with streaming types but we generate all operation serializers and deserializers as if they will
suspend
. Deserializers that just read the payload onlysuspend
to pull the payload into memory to invoke the format (e.g. JSON, XML, etc) deserializer on it. This suspension point can be lifted into the runtime by providing separate interfaces forsuspend
and non.DELTA FROM INLINING
Totals after inlining + http serde changes
Total delta with both inlining and HTTP serde changes compared to original (JVM) artifact sizes
Appendix
The extracted artifacts before and after changes:
Latest S3 JVM jar:
After inlining + HTTP serde
For comparison with Java v2 SDK:
Java S3 latest:
Java DDB latest:
Next Steps
SdkSerializable
As noted in https://github.com/awslabs/aws-sdk-kotlin/issues/411#issuecomment-1011641463 the way we generate nested struct/union serialization causes backing classes to be generated to hold the required state. I looked for ways to remove this but none are easy/clean. The best solution here is to revisit serialization and make it format specific like we did for XML deserialization . This would remove quite a bit of size from artifacts I'd imagine as we have a lot of these in practice.
All of the
field(<DESCRIPTOR>, T, ::serializeFoo)
calls andserializeSdkSerializable(...)
calls generate an additional backing class.Reduce operation error handling overhead
throwFooOperationError
is a top level function that gets generated into a separate.class
file. Class files have an overhead though so it may be smaller to just encode this into the operation error deserializer interface so they share the same class file OR for AWS protocols at least we could combine all operation handlers into a single function likethrowS3Error(...)
. This should work because AWS protocols all have the type of the error in the response and so having lots of separate functions is unnecessary. They would behave the same if combined into one.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.