lumigo-io / lumigo-CLI

Open source CLI tool to help you develop and manage serverless applications.
https://lumigo.io
Apache License 2.0
331 stars 21 forks source link

Replaying from SQS to SNS fails with "Invalid parameter: message too long" #126

Open iainelder opened 2 years ago

iainelder commented 2 years ago

I tried a command like this today to test the replay feature.

$ lumigo-cli replay-sqs-dlq --region eu-west-1 --dlqQueueName=org-trail-20211206_DLQ --targetType=SNS --targetName=org-trail-20211206 
finding the SQS DLQ [org-trail-20211206_DLQ] in [eu-west-1]
finding the SNS topic [org-trail-20211206] in [eu-west-1]
replaying events from [https://sqs.eu-west-1.amazonaws.com/345132479590/org-trail-20211206_DLQ] to [SNS:org-trail-20211206] with 10 concurrent pollers
    InvalidParameter: Invalid parameter: Message too long
    Code: InvalidParameter

I was testing it with an SNS topic that whose subscribers were deliberately broken. So the messages when replayed would immediately reenter the DLQ.

The normal publications to the SNS topic are S3 notifications like these:

{
  "Type" : "Notification",
  "MessageId" : "133aab10-1820-55cc-9aa6-f13006209e7a",
  "TopicArn" : "arn:aws:sns:eu-west-1:345132479590:OrgTrailLogFiles",
  "Subject" : "Amazon S3 Notification",
  "Message" : "{\"Records\":[{\"eventVersion\":\"2.1\",\"eventSource\":\"aws:s3\",\"awsRegion\":\"eu-west-1\",\"eventTime\":\"2021-11-30T12:40:32.032Z\",\"eventName\":\"ObjectCreated:Put\",\"userIdentity\":{\"principalId\":\"AWS:AROAINPZFXXCK5LJPKQMW:regionalDeliverySession\"},\"requestParameters\":{\"sourceIPAddress\":\"34.247.43.27\"},\"responseElements\":{\"x-amz-request-id\":\"FY2AG1GWFKX885V1\",\"x-amz-id-2\":\"mUL1UlsZusSY0vncwEYBk3SleVJNvx2avEAxMM/hnWFkbsO9PSpycQrf4CY9c3iCl0hhGOqR/ubJci8jZOePj//wn2xLJ1fO\"},\"s3\":{\"s3SchemaVersion\":\"1.0\",\"configurationId\":\"PutTrailLogFile\",\"bucket\":{\"name\":\"org-trail-20211116\",\"ownerIdentity\":{\"principalId\":\"A3MS67IJ3RE5FL\"},\"arn\":\"arn:aws:s3:::org-trail-20211116\"},\"object\":{\"key\":\"CloudTrail/AWSLogs/o-webyrpj5yp/749430203777/CloudTrail/eu-west-1/2021/11/30/749430203777_CloudTrail_eu-west-1_20211130T1240Z_M8riaqFpZp9DSf6o.json.gz\",\"size\":781,\"eTag\":\"c0aa6633d2c87a64d3b4e7fea5a4dc07\",\"versionId\":\"cePUODghK2PWwA6Hl0tS_J96cCPPXftN\",\"sequencer\":\"0061A61BBFF058F4FF\"}}}]}",
  "Timestamp" : "2021-11-30T12:40:32.734Z",
  "SignatureVersion" : "1",
  "Signature" : "STrXeQMUVEo3uPfPEZBz4Ttx4oXSV+QlxwqD05qnnajB6dUcjwK88BqnKGa7/gowSdVW1FUN/i/ls31YaUKtwI4WR9MXCKV8OYCK7Yos3Leg12IpupI5GGmUpwoWlrGgGIxVpU7YP1pKSH7+jjyQD4Aoo4i5nPBvfJAEhOTxHpdp4mIWuGPH9PrjXjK5AYLMw6BzpSKTkfjqbTyFCn/F057xWkQYs77rIqJLsnhuFWC95/exjamlrmhJ2l8NMIGiXpWqyTo1k/xFJR74jUKS00jlcNO+V247V1FrL84mJirP4AKwb556HjDcCkBKtHV1n5XSAjnMSEPJh/4EF/3zsQ==",
  "SigningCertURL" : "https://sns.eu-west-1.amazonaws.com/SimpleNotificationService-7ff5318490ec183fbaddaa2a969abfda.pem",
  "UnsubscribeURL" : "https://sns.eu-west-1.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:eu-west-1:345132479590:OrgTrailLogFiles:25f1223b-423d-43f7-98e4-2a658f8521ec"
}

It seems that Lumigo is not correctly handling the escaping of the embedded JSON.

The correct messages in the DLQ are around 2k bytes big.

When I read some of the messages from the DLQ after trying this reply command a few times, I see messages that are 14k, 40k, 74k, 142k big!

Here's a partial sample that shows the escapes are being duplicated:

{
  "Type" : "Notification",
  "MessageId" : "c3d91ba6-7aa9-5b15-9114-4194c363ef48",
  "TopicArn" : "arn:aws:sns:eu-west-1:345132479590:org-trail-20211206",
  "Message" : "{\n  \"Type\" : \"Notification\",\n  \"MessageId\" : \"51cb57e2-accc-5cd8-9d6a-0dd01555364b\",\n  \"TopicArn\" : \"arn:aws:sns:eu-west-1:345132479590:org-trail-20211206\",\n  \"Message\" : \"{\\n  \\\"Type\\\" : \\\"Notification\\\",\\n  \\\"MessageId\\\" : \\\"335974e3-66ee-53ed-bc90-2ed1df7d99de\\\",\\n  \\\"TopicArn\\\" : \\\"arn:aws:sns:eu-west-1:345132479590:org-trail-20211206\\\",\\n  \\\"Message\\\" : \\\"{\\\\n  \\\\\\\"Type\\\\\\\" : \\\\\\\"Notification\\\\\\\",\\\\n  \\\\\\\"MessageId\\\\\\\" : \\\\\\\"da8b11a1-9fee-56ca-80a8-4b93266b521c\\\\\\\",\\\\n  \\\\\\\"TopicArn\\\\\\\" : \\\\\\\"arn:aws:sns:eu-west-1:345132479590:org-trail-20211206\\\\\\\",\\\\n  \\\\\\\"Message\\\\\\\" : \\\\\\\"{\\\\\\\\n  \\\\\\\\\\\\\\\"Type\\\\\\\\\\\\\\\" : \\\\\\\\\\\\\\\"Notification\\\\\\\\\\\\\\\",\\\\\\\\n  \\\\\\\\\\\\\\\"MessageId\\\\\\\\\\\\\\\" : \\\\\\\\\\\\\\\"53b9bf67-1c53-5a54-9ba6-e555a5d30dac\\\\\\\\\\\\\\\",\\\\\\\\n  \\\\\\\\\\\\\\\"TopicArn\\\\\\\\\\\\\\\" : \\\\\\\\\\\\\\\"arn:aws:sns:eu-west-1:345132479590:org-trail-20211206\\\\\\\\\\\\\\\",\\\\\\\\n  \\\\\\\\\\\\\\\"Message\\\\\\\\\\\\\\\" : \\\\\\\\\\\\\\\"{\\\\\\\\\\\\\\\\n  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"Type\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\" : \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"Notification\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\",\\\\\\\\\\\\\\\\n  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"MessageId\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\" : \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"a7fc0494-d72a-5bf9-8b62-5ea66497034f\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\",\\\\\\\\\\\\\\\\n  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"TopicArn\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\" : \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"arn:aws:sns:eu-west-1:345132479590:org-trail-20211206\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\",\\\\\\\\\\\\\\\\n  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"Subject\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\" : \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"Amazon S3 Notification\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\",\\\\\\\\\\\\\\\\n  \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"Message\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\" :
...

My guess is that somehow the escapes are duplicated each time the message is read until there are so many escapes that the message exceeds the 256k limit of SNS.

I don't know any other tool that offers this replay-to-SNS feature, so I'd really appreciate it if you could fix this :-)