Closed timoguin closed 1 year ago
After discussion, the consensus is that we don't want to add this complexity to the current transformer logic, especially for a use-case that is specific to AWS.
Instead, we can use Firehose to deliver directly to S3 without transformation. That will keep things simple and still allow us to achieve the goal of ingesting data from Firehose.
Any sources coming from Firehose will need to handle decoding from base64, which is straightforward to do with VRL.
Allow running transformations as part of Kinesis Firehose delivery streams.
Considerations
Many AWS services still will only deliver logs to CloudWatch Logs. Some can send directly to Kinesis Firehose.
Anything sent to CloudWatch Logs can be streamed from there to Kinesis Firehose.
The transformation features of Firehose are quite nice to work with, and they would allow skipping the data batcher logic required for processing files from S3. Firehose also manages retries and other logic, and it adds a number of additional points of observability.
This will require planning and design work. I'm not sure if we'd want to make the current transformer Lambda more generic so it can process more triggers, or if we'd want to implement something slightly different for the Firehose use case. My guess is it'd be better to support more triggers / sources for the current transformer.
Tasks