open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.89k stars 2.27k forks source link

Replace pattern Issues with Transform processor #34123

Open kumachop2 opened 1 month ago

kumachop2 commented 1 month ago

Component(s)

processor/transform

Describe the issue you're reporting

Have requirement to mask fullName value in application logs. Since we are migrating from O11y to Splunk Cloud, thought of implementing the transform process at client side (Splunk Agent). Created RegEx and its working fine in regex101 but some reason its not functioning at processor end. I have not noticed any process errors in agent logs. Here is splunk agent config and sample raw message.

Helm: 0.103.0 Azure AKS Clusters

agent: enabled: true securityContext: runAsUser: 20000 runAsGroup: 20000
config: processors: transform: log_statements:

Tried with below regex as well:

Raw Message:

{"timestamp":"2024-07-16T17:51:27.133135994Z","level":"DEBUG","trace_id":"c06882917a367265fea5042cbce4b632","span_id":"0d31813def1db136","message":"our JSON body: \"{\\"query\\":\\"mutation InitiateDepositmoney_movement_microservice0($input:InitiateDepositInput!){initiateDeposit(input:$input){depositId}}\\",\\"operationName\\":\\"InitiateDepositmoney_movement_microservice0\\",\\"variables\\":{\\"input\\":{\\"accountId\\":\\"lHmY2QJKoESJyaRfvzVV4DPG6mp18yt3fPsBsoJHoYn7BzSo39yoGw==\\",\**"fullName\\":\\"John Moore\\"}}}\"","target":"apollo_router::services::subgraph_service","spans":[{"http.method":"POST","http.request.method":"POST","http.route":"/gateway","http.flavor":"HTTP/1.1","name":"request"},{"http.method":"POST","http.request.method":"POST","http.route":"/gateway","http.flavor":"HTTP/1.1","trace_id":"c06882917a367265fea5042cbce4b632","url.path":"/gateway","client.name":"","client.version":"","name":"router"},{"graphql.document":"mutation InitiateDeposit($input: InitiateDepositInput!) {\n initiateDeposit(input: $input) {\n depositId\n }\n}","graphql.operation.name":"InitiateDeposit","graphql.operation.name":"InitiateDeposit","name":"supergraph"},{"graphql.operation.type":"mutation","name":"execution"},{"apollo.subgraph.name":"money-movement-microservice","name":"fetch"},{"apollo.subgraph.name":"money-movement-microservice","graphql.document":"mutation InitiateDepositmoney_movement_microservice0($input:InitiateDepositInput!){initiateDeposit(input:$input){depositId}}","graphql.operation.name":"InitiateDepositmoney_movement_microservice0","subgraph.name":"money-movement-microservice","name":"subgraph"}],"resource":{"deployment.environment":"dev-qa","service.name":"**-federated-gateway","service.version":"1.45.1","process.executable.name":"router"}}

The expected message in Splunk logs should be masked.

{timestamp":"2024-07-16T17:51:27.133135994Z","level":"DEBUG","trace_id":"c06882917a367265fea5042cbce4b632","span_id":"0d31813def1db136","message":"our JSON body: \"{\\"query\\":\\"mutation InitiateDepositmoney_movement_microservice0($input:InitiateDepositInput!){initiateDeposit(input:$input){depositId}}\\",\\"operationName\\":\\"InitiateDepositmoney_movement_microservice0\\",\\"variables\\":{\\"input\\":{\\"accountId\\":\\"lHmY2QJKoESJyaRfvzVV4DPG6mp18yt3fPsBsoJHoYn7BzSo39yoGw==\\",\\"fullName\\":\\"xxx\\"}}}\","target":"apollo_router::services::subgraph_service","spans":[{"http.method":"POST","http.request.method":"POST","http.route":"/gateway","http.flavor":"HTTP/1.1","name":"request"},{"http.method":"POST","http.request.method":"POST","http.route":"/gateway","http.flavor":"HTTP/1.1","trace_id":"c06882917a367265fea5042cbce4b632","url.path":"/gateway","client.name":"","client.version":"","name":"router"},{"graphql.document":"mutation InitiateDeposit($input: InitiateDepositInput!) {\n initiateDeposit(input: $input) {\n depositId\n }\n}","graphql.operation.name":"InitiateDeposit","graphql.operation.name":"InitiateDeposit","name":"supergraph"},{"graphql.operation.type":"mutation","name":"execution"},{"apollo.subgraph.name":"money-movement-microservice","name":"fetch"},{"apollo.subgraph.name":"money-movement-microservice","graphql.document":"mutation InitiateDepositmoney_movement_microservice0($input:InitiateDepositInput!){initiateDeposit(input:$input){depositId}}","graphql.operation.name":"InitiateDepositmoney_movement_microservice0","subgraph.name":"money-movement-microservice","name":"subgraph"}],"resource":{"deployment.environment":"dev-qa","service.name":"***-federated-gateway","service.version":"1.45.1","process.executable.name":"router"}}

Thank You in Advance.

github-actions[bot] commented 1 month ago

Pinging code owners:

evan-bradley commented 1 month ago

Could you try enabling debug logging to look at where your data is located inside the payload? Instructions to enable debug logging can be found here: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/ottl#troubleshooting. I suspect the body may not exist at the OTTL path you are specifying, and the regex isn't working because it isn't running on any data.

In particular, I'm not clear that this is the right path: attributes["body.message"]. The right path is likely just body if you haven't parsed the logs inside the Collector. If it has been parsed and the body is in the attributes, the right path would be attributes["body"]["message"].

kumachop2 commented 1 month ago

@evan-bradley Thank You very much for your prompt response. I learned that path is body. Turned on debug level and based on collector logs condition matched but its not redacting the value. Also I do noticed issues with conditions as well. Please review and advice.

Agent Config:

    log_statements:
      - context: log
        conditions:
          - attributes["k8s.container.name"] == "*-federated-gateway"     
        statements:
          - replace_pattern(attributes["body"], "(.*)function(.*)", "$1--$2")

Collector Log:

2024-07-18T23:53:37.153Z    debug   ottl@v0.103.0/parser.go:268 TransformContext after statement execution  {"kind": "processor", "name": "transform", "pipeline": "logs", "statement": "replace_pattern(attributes[\"body\"], \"(.*)function(.*)\", \"--\")", "condition matched": true, "TransformContext": {"resource": {"attributes": {"com.splunk.sourcetype": "kube:container:********-federated-gateway", "com.splunk.source": "/var/log/pods/********_********-federated-gateway-route1-76f4d844d6-qcz5x_243efb14-eb0f-41ca-b903-e218692394c1/********-federated-gateway/0.log", "k8s.pod.uid": "243efb14-eb0f-41ca-b903-e218692394c1", "k8s.container.restart_count": "0", "k8s.container.name": "********-federated-gateway", "k8s.namespace.name": "********", "k8s.pod.name": "********-federated-gateway-route1-76f4d844d6-qcz5x", "cloud.provider": "azure", "cloud.platform": "azure_aks", "host.name": "aks-userpool-16002329-vmss_3", "cloud.region": "eastus", "host.id": "5827d12f-bcc2-49bb-955f-3b082c96037a", "cloud.account.id": "15d5381e-b094-40e4-8679-17ccfbb26d94", "azure.vm.name": "aks-userpool-16002329-vmss_3", "azure.vm.size": "Standard_D8s_v4", "azure.vm.scaleset.name": "aks-userpool-16002329-vmss", "azure.resourcegroup.name": "rg-nodepool-dev-usea", "os.type": "linux", "k8s.node.name": "aks-userpool-16002329-vmss000003", "k8s.cluster.name": "aks-********-middleware_primary-dev-usea", "deployment.environment": "dev"}, "dropped_attribute_count": 0}, "scope": {"attributes": {}, "dropped_attribute_count": 0, "name": "", "version": ""}, "log_record": {"attributes": {"log.iostream": "stdout", "logtag": "F"}, "body": "{\"timestamp\":\"2024-07-18T23:53:36.993027592Z\",\"level\":\"DEBUG\",\"trace_id\":\"1bed38e0eda6d848e23418eef7bfd9bc\",\"span_id\":\"3103a3108bdcaca0\",\"message\":\"subgraph_service function found\",\"target\":\"apollo_router::plugins::rhai\",\"spans\":[{\"http.method\":\"POST\",\"http.request.method\":\"POST\",\"http.route\":\"/gateway\",\"http.flavor\":\"HTTP/1.1\",\"name\":\"request\"},{\"http.method\":\"POST\",\"http.request.method\":\"POST\",\"http.route\":\"/gateway\",\"http.flavor\":\"HTTP/1.1\",\"trace_id\":\"1bed38e0eda6d848e23418eef7bfd9bc\",\"url.path\":\"/gateway\",\"client.name\":\"\",\"client.version\":\"\",\"name\":\"router\"},{\"graphql.document\":\"query Widgets($widgetParameters: [WidgetParameter!]!) {\\n  widgets(widgetParameters: $widgetParameters) {\\n    type\\n    url\\n    __typename\\n  }\\n}\",\"graphql.operation.name\":\"Widgets\",\"graphql.operation.name\":\"Widgets\",\"name\":\"supergraph\"},{\"graphql.operation.type\":\"query\",\"name\":\"execution\"},{\"apollo.subgraph.name\":\"evolved-microservice\",\"name\":\"fetch\"}],\"resource\":{\"service.version\":\"1.45.1\",\"deployment.environment\":\"dev\",\"service.name\":\"********-federated-gateway\",\"process.executable.name\":\"router\"}}", "dropped_attribute_count": 0, "flags": 0, "observed_time_unix_nano": 1721346817052828063, "severity_number": 0, "severity_text": "", "span_id": "0000000000000000", "time_unix_nano": 
kumachop2 commented 1 month ago

@evan-bradley Please do review above mentioned findings and provide your inputs. Much appreciated your timely response.

evan-bradley commented 1 month ago

The path you want is just body. I would do something like this:

    log_statements:
      - context: log
        conditions:
          - attributes["k8s.container.name"] == "*-federated-gateway"     
        statements:
          - replace_pattern(body, "(.*)function(.*)", "$1--$2")

This will only remove the word function from your body, though, so you may need to revise your regular expression a bit. I would consider using the ParseJSON function to parse the body and directly edit the message field on the body.