triggermesh / aws-event-sources

Knative event sources for AWS services
Apache License 2.0
60 stars 14 forks source link

Message attributes aren't propagated to CloudEvents #330

Closed antoineco closed 3 years ago

antoineco commented 3 years ago

While receiving messages, we don't request MessageAttributeNames, therefore part of the information contained in the original message might be lost.

noahw3 commented 3 years ago

I'm curious if it might make sense to include the attributes as CloudEvent attributes so they're filterable downstream with triggers?

antoineco commented 3 years ago

That's what I had in mind yes. We just have to be careful not to overlap with core CloudEvents attributes (type, source, id, and all other optional ones defined in the CloudEvents spec). Maybe the solution would be to "namespace" SQS message attributes by prepending them with sqsmsg or another short prefix while we process them in the receive adapter.

antoineco commented 3 years ago

I enabled MessageAttributeNames: All locally, sent a message with 2 String attributes, and here is the shape of the message I got:

☁  cloudevents.Event
Context Attributes,
  specversion: 1.0
  type: com.amazon.sqs.message
  source: arn:aws:sqs:us-west-1:123456789012:test
  id: 5f501589-ec15-4010-b3a2-e9255efc0909
  time: 2021-09-21T14:22:22.611826Z
  datacontenttype: application/json
Data,
  {
    "Attributes": {
      "ApproximateFirstReceiveTimestamp": "1632234139106",
      "ApproximateReceiveCount": "1",
      "SenderId": "AIDAQUHRFMYWSR2PNVTVZ",
      "SentTimestamp": "1632234139093"
    },
    "Body": "{\"msg\": \"somejson\"}",
    "MD5OfBody": "9bdd069d39200fde3b455c42d4a6e052",
    "MD5OfMessageAttributes": "9813f6c83f40d94a47d0dd262c42f2f3",
    "MessageAttributes": {
      "AnAttribute": {
        "BinaryListValues": null,
        "BinaryValue": null,
        "DataType": "String.Foo",
        "StringListValues": null,
        "StringValue": "The value of AnAttribute"
      },
      "OtherAttribute": {
        "BinaryListValues": null,
        "BinaryValue": null,
        "DataType": "String",
        "StringListValues": null,
        "StringValue": "The value of OtherAttribute"
      }
    },
    "MessageId": "5f501589-ec15-4010-b3a2-e9255efc0909",
    "ReceiptHandle": "AQEBaolxWjg97PPyjKQl+Hd5kPM4VUGhz45Y37kO2Nnepc5p3jZx9+9oWukD2DBmMqbuejyJld+iK+r7Oq/ElSxHZ0LZVIT/o2LmlB6LU4QBWPDBW0g52qxq3y/4R6unxDEVB0aEZ5LIjRcfxNlLVSctqNu9gmslcqewyl5b+OrnKreUZ8PmBEbUV+dODfAtGYtB990pwUzkQEfXjNK/9zab1qWoF+0W8WLNJl50lm1FsTf+MCtROaMVjt5oo1WXo4xTg8VWrM/wGo/7mFEr9De6+0dEVJNYqU0fgucp5Jtglf6BKjcI0Zd1qiDdFjrskfG22bAZZTe6go3xXW29mGEfOqRKOSgxHPCqFJhrXeBYAkD+3ba4MrUM6IaXoTgF5j5R"
  }

While system attributes (Attributes) could easily be moved to CloudEvents extension attributes (they are all plain Strings), message attributes (MessageAttributes) carry extra information which I'm not sure how to propagate correctly as CloudEvents extension attributes:

Also the attribute name may contain non-alphanumeric characters, whereas CloudEvents context attributes may not. This could create conflicts if those characters get discarded (what if a message has both "MyAttribute" and "My_Attribute" set?).

-> Values could be encoded using a format similar to Data URLs (data:String,Hello%2C%20World%2 or data:String;base64,SGVsbG8sIFdvcmxkIQ==). -> I'm not sure how to handle underscore, hyphens and periods, need to think about it.

For the time being, it might be safer to just pass message attributes untouched inside the event's data, as shown in my example.

@noahw3 any thoughts? ☝️

noahw3 commented 3 years ago

It seems entirely reasonable to me to just pick a convention and document it.

Only including string and number data types (or their custom labels - it appears that this is just for annotation and doesn't meaningfully change their valid values), and not binary types, seems reasonable. I can't think of a use case where someone would want to do exact string matching in a trigger on a base64 binary blob.

Adding a prefix and mapping other special characters onto the valid set also sounds fine. Frankly I don't think it matters too much exactly how that mapping works (within reason) as long as it's documented. Like you said the main concern would be maintaining key uniqueness.

noahw3 commented 3 years ago

Looking at the cloudevent spec it actually looks like it does support binary attributes so maybe that's not a concern.

Perhaps more problematic though is the 20 character name limit.

Maybe it's as simple as stating what the requirements are and dropping anything that doesn't meet those requirements?

antoineco commented 3 years ago

I like that 👍 Only set attributes which are either string or number (including custom) and discard binary. The main goal is to enable filtering/routing of CloudEvents, and I'm not sure whether binary extensions help with that, even if they're technically supported. The full set of message attributes remains part of the event data anyway. Happy to also include binary if you think there is a case for it that justifies increasing the overall size of events.

Regarding keys, I think we can consider conflicting names an edge case. I'm yet to see a scenario like the one I mentioned in the real world... The 20 characters limit is a SHOULD, there is no enforcement of such limit so we should be fine.

I just opened #335