matanolabs / matano

Open source security data lake for threat hunting, detection & response, and cybersecurity analytics at petabyte scale on AWS
https://matano.dev
Apache License 2.0
1.46k stars 99 forks source link

Cloudflare HTTP Event Log Source Schema is incorrect for `BotTags` #186

Open deeso opened 10 months ago

deeso commented 10 months ago

There is a bug in the the cloudflare http event schema. The schema defines the cloudflare.http_event.bot.tag as a string here, but the actual value is an array of strings, see: https://github.com/matanolabs/matano/blob/b9975f5e92a3c9aedca2e8879bb4b81f6861eb97/data/managed/log_sources/cloudflare/tables/http_request.yml#L60

When the VRL parses the log, the result is either null or an array of strings at this location: https://github.com/matanolabs/matano/blob/b9975f5e92a3c9aedca2e8879bb4b81f6861eb97/data/managed/log_sources/cloudflare/tables/http_request.yml#L457

This causes any JSON log line containing a BotTags array to fail and be sidelined by the transform. The error creates the following error message in the CloudWatch logs for the Data Transformer lambda:

ERROR transformer: Line error: Line err: SchemaMismatchError, msg: Failed to resolve schema for due to schema mismatch for table cloudflare_http_request. (log source: tablename)

To fix this issue, this block snippet needs to be converted from:

         - name: bot
            type:
              type: struct
              fields:
              - name: score
                type:
                  type: struct
                  fields:
                  - name: src
                    type: string
                  - name: value
                    type: long
              - name: tag
                type: string

To:

         - name: bot
            type:
              type: struct
              fields:
              - name: score
                type:
                  type: struct
                  fields:
                  - name: src
                    type: string
                  - name: value
                    type: long
              - name: tag
                 type: list
                    element: string
Samrose-Ahmed commented 10 months ago

That looks correct, happy to accept a PR.

If you wish to continue using the existing table, you will have to manually drop or rename the column from your table via Spark or API since it's a breaking schema change (if you're testing you can just recreate it).