softwaremill / elasticmq

In-memory message queue with an Amazon SQS-compatible interface. Runs stand-alone or embedded.
https://softwaremill.com/open-source/
Apache License 2.0
2.53k stars 193 forks source link

x-amz-json-1.0 not properly serializing error responses #903

Open btalbot opened 10 months ago

btalbot commented 10 months ago

The errors from ElasticMQ (1.5.1) when using the application/x-amz-json-1.0 content-type are incomplete. The errors returned seem to resolve to just Sender which is not really helpful and cause the AWS SDK (the ruby version at least) to always result in an undocumented exception like Aws::SQS::Errors::Sender to be raised instead of the documented errors.

According to the smithy.io specs (best docs for the protocol I can find) the response body __type is expected to contain the name of the error shape.

https://smithy.io/1.0/spec/aws/aws-json-1_0-protocol.html#operation-error-serialization

The response body from real SQS when attempting to delete a non-existent queue with the x-amz-json-1.0 content-type is

{"__type":"com.amazonaws.sqs#QueueDoesNotExist","message":"The specified queue does not exist."}

while the response body from ElasticMQ 1.5.1 is

{"Message":"AWS.SimpleQueueService.NonExistentQueue; see the SQS docs.","__type":"Sender"}

The smithy.io docs do add more details including options to carry error details in a response header and I do not know which is the proper solution. I've also not attempted to use updated protocols like x-amz-json-1.1.

btalbot commented 10 months ago

The impact of this issue is that SQS clients do not get proper errors and seem to always get a generic and undocumented error instead. Below is a simple JS script that attempts the action AmazonSQS.GetQueueUrl for a queue that does not exist. The AWS SQS endpoint results in the error type as documented but the ElasticMQ endpoint does not.

import { SQSClient, GetQueueUrlCommand, QueueDoesNotExist } from "@aws-sdk/client-sqs";

const emq_client = new SQSClient({ region: "us-east-1", endpoint: "http://localhost:9324" });
const aws_client = new SQSClient({ region: "us-east-1" });

const params = { QueueName: "no-such-queue" }
const command = new GetQueueUrlCommand(params);

async function getQueueUrl(client) {

  try {
    const data = await client.send(command);
    console.log(data);
  } catch (error) {
    if (error instanceof QueueDoesNotExist) {
      console.log("Got QueueDoesNotExist as expected");
    } else {
      console.log(error);
    }
  }

}

console.log("GetQueueUrl with AWS SQS");
await getQueueUrl(aws_client);
console.log("GetQueueUrl with ElasticMQ");
await getQueueUrl(emq_client);

The output for me, running ElasticMQ 1.5.1 is

GetQueueUrl with AWS SQS
Got QueueDoesNotExist as expected
GetQueueUrl with ElasticMQ
Sender: AWS.SimpleQueueService.NonExistentQueue; see the SQS docs.
    at throwDefaultError (/Users/btalbot/tester/sqs-errors/node_modules/@smithy/smithy-client/dist-cjs/default-error-handler.js:8:22)
    at /Users/btalbot/tester/sqs-errors/node_modules/@smithy/smithy-client/dist-cjs/default-error-handler.js:18:39
    at de_GetQueueUrlCommandError (/Users/btalbot/tester/sqs-errors/node_modules/@aws-sdk/client-sqs/dist-cjs/protocols/Aws_json1_0.js:662:20)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async /Users/btalbot/tester/sqs-errors/node_modules/@smithy/middleware-serde/dist-cjs/deserializerMiddleware.js:7:24
    at async /Users/btalbot/tester/sqs-errors/node_modules/@aws-sdk/middleware-signing/dist-cjs/awsAuthMiddleware.js:14:20
    at async /Users/btalbot/tester/sqs-errors/node_modules/@smithy/middleware-retry/dist-cjs/retryMiddleware.js:27:46
    at async /Users/btalbot/tester/sqs-errors/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:7:26
    at async getQueueUrl (file:///Users/btalbot/tester/sqs-errors/tester.mjs:12:18)
    at async file:///Users/btalbot/tester/sqs-errors/tester.mjs:27:1 {
  '$fault': 'client',
  '$metadata': {
    httpStatusCode: 400,
    requestId: undefined,
    extendedRequestId: undefined,
    cfId: undefined,
    attempts: 1,
    totalRetryDelay: 0
  },
  __type: 'Sender'
micossow commented 10 months ago

I fixed the NonExistentQueue error as a hotfix, however in order to completely solve the issue, the remaining errors need to be reviewed and adjusted for the new format.

btalbot commented 10 months ago

Thank you. It does look like a fair amount of work to support the updated protocol

johncoleman83 commented 7 months ago

@micossow , I don't think this issue applies to just errors. I'm seeing this in successes and errors.

Are there plans to build support for elasticmq to return http responses with content-type: x-amz-json-<version> per the AWS SQS docs or are there any workarounds to change content-type?

I believe elasticmq can handle requests of this content-type, but then the app still returns content-type: text/xml. I'm using terraform to provision SQS using the mock elasticmq, and I think current versions of the hashicorp/aws terraform provider do not handle the content type text/xml. Terraform appears to be expecting JSON response, so it breaks when fetching state when it receives text/xml.

micossow commented 7 months ago

@johncoleman83 can you provide an example of a specific request, that produces an invalid response and what is the expected response?