Closed marcin-kwasnicki closed 1 year ago
We should support this. We can either make the parsing more advanced and figure out the compression from some magic bytes (believe there's a crate for this) or just simply allow the user to explicitly specify the compression for a log source in the configuration. @shaeqahmed will look into this.
I have added https://github.com/matanolabs/matano/commit/c7e58d7e2e21d407bc997bc55e557b6b3a01309b .
So you can now add to your log_source.yml
:
# log_source.yml
ingest:
compression: "gzip"
and Matano will use that compression.
Leaving issue open if we want to add the more advanced auto compression inference.
I have added https://github.com/matanolabs/matano/commit/c7e58d7e2e21d407bc997bc55e557b6b3a01309b .
So you can now add to your
log_source.yml
:# log_source.yml ingest: compression: "gzip"
and Matano will use that compression.
Leaving issue open if we want to add the more advanced auto compression inference.
This configuration has been replaced with compression auto-inference, so manually specifying the compression format in the log source is no longer necessary 💯
Confirmed that this is issue has been fixed, as well as added an automated method to parse cloudwatch logs written to S3 from a subscription for line-by-line consumption in Matno by using flag:
ingest:
s3_source:
is_from_cloudwatch_log_subscription: true
Hello,
I ran into this issue while testing matano on some sample log files. TransformerLambda fails with the message:
thread 'main' panicked at 'called
Result::unwrap()on an
Errvalue: stream did not contain valid UTF-8', transformer/src/main.rs:538:58
The file that I want to parse is delivered by Kinesis Firehose and it is Cloudtrail logs streamed from Cloudwatch to S3. It doesn't have an extension and content-type is marked as 'application/octet-stream'. Inside there is JSON file represting Cloudwatch event. Important note on that type of file can found here: https://docs.aws.amazon.com/firehose/latest/dev/writing-with-cloudwatch-logs.html. "CloudWatch log events are compressed with gzip level 6. If you want to specify OpenSearch Service or Splunk as the destination for the delivery stream, use a Lambda function to uncompress the records to UTF-8.and single-line JSON" I suspect that maybe some additonal parsing is required for that type of file.