[Server] S3 operations require ?x-id=OPNAME which is not compatible with existing clients

guymguym commented 2 years ago

There are several operations in S3 model with an http route spec that don't seem to match existing client SDK behavior.

One example is GetObject which has this trait:

                "smithy.api#http": {
                    "method": "GET",
                    "uri": "/{Bucket}/{Key+}?x-id=GetObject",
                    "code": 200
                }

This spec requires ?x-id=GetObject, and the codegen-server will not match this operation unless this query is provided, which is not compatible to standard AWS clients (such as the AWS CLI which uses python SDK) and is also not a documented query param in the REST API docs (link).

Any idea how is this model definition compatible with existing clients?

CC @crisidev

crisidev commented 2 years ago

I honestly do not know. What I believe is that the S3 model is not the best example of standard practices and we will have to review the behaviour. You are already providing a ton of help with these bug reports BTW :)

I am currently on holidays until the 10th of January and I will have to reconvene with my team mates on this before coming back with a real answer. I believe we will have to implement some special handling for this as the codegen is doing the right thing on paper and following the model's directives but, as you noticed, the API is documented differently.

rcoh commented 2 years ago

also worth noting that since the Rust S3 client is generated from exactly these models it should be compatible which might make testing easier for the moment

david-perez commented 2 years ago

@guymguym Thanks for reporting this.

@rcoh If I understand correctly, the S3 service is accepting requests without x-id=GetObject in the query string, yet their Smithy model is marking that param as required. What are our options here? We're supposed to follow the spec and reject invalid requests. Should we contact S3 to amend their Smithy model? Or do we need to provide an escape-hatch in the form of a codegen setting to disable this kind of rejections, for people wanting to build servers adhering to AWS APIs, if this modeling inaccuracy is widespread among AWS services?

rcoh commented 2 years ago

To be honest, I actually don't know. @adamthom-amzn who worked on the Typescript server codegen may know better. My guess is that an S3 client that works with all clients may need some hand written overrides.

guymguym commented 2 years ago

@david-perez Since this seems like a gap in the spec that will not be able to close soon, and the server routing cannot differentiate between "similar" S3 operations without it (another example is PutObject vcs CopyObject which differ only in an http header x-amz-copy-from ...), so perhaps we can expose named RequestSpec's from the operation_registry code, and then I will be able to code these rules manually. WDYT?

david-perez commented 2 years ago

@guymguym I don't understand what you mean with named RequestSpecs and coding the rules manually. Can you be more specific?

guymguym commented 2 years ago

@david-perez Let me rephrase -

currently in operation_registry the server returns a aws_smithy_http_server::routing::Router instance from a OperationRegistry.

I have several problems to extend this model:

The routes vec in the router is private, so I can't modify it to tweak its order.
The request_spec.match() function is not customizable so I cannot affect how matching is done per route.
Last, for an API like S3, there are ~200 operations, and most of those have the same uri_path_regex regexp (all bucket operations, all object operations, ...) so what actually happens is that we iterate and call a bunch of the same match calls per every operation, which can be easily optimized.

So I can easily write the S3 routing logic by hand, it's simple and not too long to maintain, but I would want access to the generated route_specs - so I thought of adding the route_specs as fields to the OperationRegistry struct where each field is named after its operation and holds the route_spec that I can reuse its generated information.

BTW was there a reason not to generate a single trait for the registry where each route is an async function?

Maybe there are other ways to achieve the same through tower::Layer but I am not sure how...

Would be great to hear your thoughts! Thanks!!

david-perez commented 2 years ago

@guymguym

currently in operation_registry the server returns a aws_smithy_http_server::routing::Router instance from a OperationRegistry.

I have several problems to extend this model:

The routes vec in the router is private, so I can't modify it to tweak its order.

The request_spec.match() function is not customizable so I cannot affect how matching is done per route.

Last, for an API like S3, there are ~200 operations, and most of those have the same uri_path_regex regexp (all bucket operations, all object operations, ...) so what actually happens is that we iterate and call a bunch of the same match calls per every operation, which can be easily optimized.

(1) and (2) are by design. The user should only have to define their Smithy model and not worry about the internal routing implementation.

For (3), I'm not sure I follow. Routing early-returns when the first RequestSpec matches the incoming request, so there are no unnecessary matches.

https://github.com/awslabs/smithy-rs/blob/f4d3910f227ceb4962eadac994540edb45137709/rust-runtime/aws-smithy-http-server/src/routing/mod.rs#L153-L162

If what you mean is that you're worried that "common" routes are at the end of the vector, then you're right in that most requests will do a bunch of unnecessary regex matches before being routed, and that it'd be more efficient to sort routes by how frequent they'll be hit. I don't think iterating over a 200-item vector is a performance issue; the business logic is likely orders of magnitude more time-consuming than the routing. Time incurred by matching with regexes can definitely be improved though: since the routes are static, we can use code-generated nom parsers, which are faster according to this benchmark, and which we're using to do label extraction.

Ultimately though, routing would continue to be linear in the number of routes. Routing is linear in most web frameworks I know of. The only exception being Axum, which uses the matchit router, that internally builds a radix tree, thus making routing sublinear. I briefly looked into using matchit in smithy-rs, but it wasn't easy to adapt it to Smithy's routing requirements (for instance, routing based on query parameters). I think when Smithy publishes their URI conflict resolution proposal, we should revisit the routing implementation and give sublinear routing another try, although I suspect the performance benefits will only be noticeable for a huge amount of routes.

If you run any benchmarks comparing current routing performance with any of these ideas, I'd be interested in the numbers, to see which one we should implement.

BTW was there a reason not to generate a single trait for the registry where each route is an async function?

The generated operation registry requires that when you register an operation implementation (the async function or closure), it conforms to the specific operation's input and output shapes. So the function's input and output have to be of certain types, namely, the input has to be the operation's input, and the output has to be the operation's output, or a Result in case the operation is fallible. If all you'd require in the operation registry were for operation implementations to implement a certain trait, you'd lose this type-safety.

Routing performance aside, this GitHub issue is because of the ambiguous routing that arises if you remove the ?x-id=OPNAME from the S3 model. And that you'd like a temporary workaround to be able to remove that query param and still be able to build the server SDK and for routing to not be ambiguous. We should come up with ideas to tackle that. If you'd like to discuss or improve routing performance, please open another issue.

guymguym commented 2 years ago

Thanks @david-perez !! Routing performance aside. I don't know yet what to suggest to resolve the x-id ambiguity, it seems that S3 API is a hybrid of operations vs. resources oriented API... Perhaps looking at the manual routing I wrote for s3d would help to ignite the discussion, see here -

Parsing http path&query to op-kind enum: https://github.com/guymguym/s3d/blob/518b19d33b63b3a6040279009564247fa3453a64/src/http.rs#L107-L173

This is the manual mapping code: https://github.com/guymguym/s3d/blob/518b19d33b63b3a6040279009564247fa3453a64/src/resources.rs

david-perez commented 2 years ago

@guymguym Can you provide a list of conflicting routes in the S3 API you'd like to see disambiguated?

guymguym commented 2 years ago

@david-perez Sure I already extracted this list. Here is all the S3 operations with x-id in their uri:

$ node -e 'm = require("./aws/sdk/aws-models/s3.json"); for (s of Object.values(m.shapes)) { t = s.traits?.["smithy.api#http"]; if (t?.uri.includes("x-id=")) console.log(t.method, t.uri); }' | sort

DELETE /{Bucket}/{Key+}?x-id=AbortMultipartUpload
DELETE /{Bucket}/{Key+}?x-id=DeleteObject

GET /{Bucket}/{Key+}?x-id=GetObject
GET /{Bucket}/{Key+}?x-id=ListParts
GET /{Bucket}?analytics&x-id=GetBucketAnalyticsConfiguration
GET /{Bucket}?analytics&x-id=ListBucketAnalyticsConfigurations
GET /{Bucket}?intelligent-tiering&x-id=GetBucketIntelligentTieringConfiguration
GET /{Bucket}?intelligent-tiering&x-id=ListBucketIntelligentTieringConfigurations
GET /{Bucket}?inventory&x-id=GetBucketInventoryConfiguration
GET /{Bucket}?inventory&x-id=ListBucketInventoryConfigurations
GET /{Bucket}?metrics&x-id=GetBucketMetricsConfiguration
GET /{Bucket}?metrics&x-id=ListBucketMetricsConfigurations

POST /WriteGetObjectResponse?x-id=WriteGetObjectResponse
POST /{Bucket}/{Key+}?restore&x-id=RestoreObject
POST /{Bucket}/{Key+}?select&select-type=2&x-id=SelectObjectContent
POST /{Bucket}/{Key+}?uploads&x-id=CreateMultipartUpload
POST /{Bucket}/{Key+}?x-id=CompleteMultipartUpload
POST /{Bucket}?delete&x-id=DeleteObjects

PUT /{Bucket}/{Key+}?x-id=CopyObject
PUT /{Bucket}/{Key+}?x-id=PutObject
PUT /{Bucket}/{Key+}?x-id=UploadPart
PUT /{Bucket}/{Key+}?x-id=UploadPartCopy

Case 1 - route based on a required http header existence

The header in these cases is x-amz-copy-source which helps to separate between copy and put:

PUT /{Bucket}/{Key+}?x-id=CopyObject  (+x-amz-copy-source)
PUT /{Bucket}/{Key+}?x-id=PutObject

PUT /{Bucket}/{Key+}?x-id=UploadPart
PUT /{Bucket}/{Key+}?x-id=UploadPartCopy  (+x-amz-copy-source)

Case 2 - route based on existence of a required query key which is also an input field

These cases were added because a required query param cannot be specified in the route URI when its value is an actual input field (like ?uploadId=123 or ?partNumber=456), and not a fixed api value (like ?list-type=2). In the smithy spec you cannot specify just a key like ?uploadId in the uri to separate those cases because that's just like ?uploadId= and will fail to route and a value is provided. Maybe the spec can allow it with ?uploadId=* ?

I extracted this list of all httpQuery & required cases, but notice that not all of these required adding x-id because for some of them there is no other operation which conflicted, but the ones that do exist in the top list like GetObject vs ListParts, or GetBucketMetricsConfiguration vs ListBucketMetricsConfigurations indeed fall in this case.

$ node -e 'm = require("./aws/sdk/aws-models/s3.json"); for ([n,s] of Object.entries(m.shapes)) if (s.type == "structure") for ([k,v] of Object.entries(s.members)) if (v?.traits?.["smithy.api#httpQuery"] && v?.traits?.["smithy.api#required"]) console.log(k, n);' | sort

Id com.amazonaws.s3#DeleteBucketAnalyticsConfigurationRequest
Id com.amazonaws.s3#DeleteBucketIntelligentTieringConfigurationRequest
Id com.amazonaws.s3#DeleteBucketInventoryConfigurationRequest
Id com.amazonaws.s3#DeleteBucketMetricsConfigurationRequest
Id com.amazonaws.s3#GetBucketAnalyticsConfigurationRequest
Id com.amazonaws.s3#GetBucketIntelligentTieringConfigurationRequest
Id com.amazonaws.s3#GetBucketInventoryConfigurationRequest
Id com.amazonaws.s3#GetBucketMetricsConfigurationRequest
Id com.amazonaws.s3#PutBucketAnalyticsConfigurationRequest
Id com.amazonaws.s3#PutBucketIntelligentTieringConfigurationRequest
Id com.amazonaws.s3#PutBucketInventoryConfigurationRequest
Id com.amazonaws.s3#PutBucketMetricsConfigurationRequest

UploadId com.amazonaws.s3#AbortMultipartUploadRequest
UploadId com.amazonaws.s3#CompleteMultipartUploadRequest
UploadId com.amazonaws.s3#ListPartsRequest

# these have 2 required query params each
UploadId com.amazonaws.s3#UploadPartCopyRequest
UploadId com.amazonaws.s3#UploadPartRequest
PartNumber com.amazonaws.s3#UploadPartCopyRequest
PartNumber com.amazonaws.s3#UploadPartRequest

Case 3 - other POST cases

I am not sure exactly about the POST cases - some might require the x-id because of a conflict with case1/2, but there are some which I didn't understand why this was needed, such as WriteGetObjectResponse, RestoreObject, SelectObjectContent, DeleteObjects.

smithy-lang / smithy-rs