microsoft / ApplicationInsights-Java

Application Insights for Java
http://aka.ms/application-insights
Other
296 stars 199 forks source link

The trace logs of app-insights in azure portal don't reflect the effective container logs (masked out sensitive data) #2466

Closed bvahdat closed 10 months ago

bvahdat commented 2 years ago

Expected behavior

The trace logs provided by app-insights should reflect the effective container logs of a given application.

Actual behavior

To have custom dimensions for our structured logging we make use of:

https://github.com/logfellow/logstash-logback-encoder

And we configure it to mask out sensitive data as per documentation here:

https://github.com/logfellow/logstash-logback-encoder#identifying-field-values-to-mask-by-value

And it actually works pretty well as because looking into the container logs we see that sensitive fields inside JSON are properly masked out, for example the container log entries look like:

{
  "@timestamp": "2022-07-21T12:15:41.181+02:00",
  "@version": "1",
  "message": "Processing of the event with cloudEventId: 6415d31r-b0o3-5cx8w-d934-d5320981278z and salesforceExternalId: 00525000008yPMSCC9 took 294 ms. which was successful.",
  "logger_name": "com.foo.bar.servicebus.salesforce.policy.MdcRoutePolicy",
  "thread_name": "main",
  "level": "INFO",
  "level_value": 20000,
  "forgerockEventBody": "{\n  \"data\": {\n    \"FrIndexedString1\": \"00525000008yPMSCC9\",\n    \"FrUnindexedString1\": \"George Michael\",\n    \"FrUnindexedString2\": \"en-GB\",\n    \"mail\": \"******\",\n    \"telephoneNumber\": \"******\"\n  },\n  \"datacontenttype\": \"application/json\",\n  \"dataschema\": \"foo:integration-hubs\",\n  \"id\": \"6415d31r-b0o3-5cx8w-d934-d5320981278z\",\n  \"source\": \"forgerock:idm\",\n  \"specversion\": \"1.0\",\n  \"time\": \"2018-04-05T17:31:00Z\",\n  \"type\": \"com.forgerock.user.migrated\"\n}\n",
  "trace_id": "00000000000000000000000000000000",
  "camel.contextId": "foo-ihub-connector-servicebus-salesforce",
  "salesforceMigrateAccountsSfResponseCode": "0",
  "span_id": "0000000000000000",
  "salesforceMigrateIndividualRequestBody": "{\"allOrNone\":true,\"compositeRequest\":[{\"body\":{\"FOO_Customer_Identity_External_ID__c\":\"00525000008yPMSCC9\",\"FirstName\":\"George Michael\",\"LastName\":\"Unknown\",\"REIDP_Email__c\": \"******\",\"REIDP_ForgeRock_Identity__c\":\"true\",\"REIDP_LanguageLocaleKey__c\":\"en-GB\",\"REIDP_Mobilephone__c\": \"******\"},\"method\":\"POST\",\"referenceId\":\"refIndividual\",\"url\":\"/services/data/v54.0/sobjects/Individual\"},{\"method\":\"GET\",\"referenceId\":\"refAccount\",\"url\":\"/services/data/v54.0/query/?q=SELECT+id+from+Account+where+FOO_Customer_Identity_External_ID__pc+=+%2700525000008yPMSCC9%27+AND+IsPersonAccount+=+TRUE\"}]}",
  "camel.routeId": "id-servicebus-to-salesforce",
  "cloudEventId": "6415d31r-b0o3-5cx8w-d934-d5320981278z",
  "salesforceMigrateIndividualResponseBody": "{\n  \"compositeResponse\": [\n    {\n      \"body\": {\n        \"id\": \"00525000008yPMSCC9\",\n        \"success\": true,\n        \"errors\": []\n      },\n      \"httpHeaders\": {\n        \"Location\": \"/services/data/v54.0/sobjects/Individual/0PK25000002zW7tGAE\"\n      },\n      \"httpStatusCode\": 201,\n      \"referenceId\": \"refIndividual\"\n    },\n    {\n      \"body\": {\n        \"totalSize\": 1,\n        \"done\": true,\n        \"records\": [\n          {\n            \"attributes\": {\n              \"type\": \"Account\",\n              \"url\": \"/services/data/v54.0/sobjects/Account/0012500001i9vb3AAA\"\n            },\n            \"Id\": \"0012500001i9vb3AAA\"\n          }\n        ]\n      },\n      \"httpHeaders\": {},\n      \"httpStatusCode\": 200,\n      \"referenceId\": \"refAccount\"\n    }\n  ]\n}",
  "camel.exchangeId": "F577AC290CF4B5D-0000000000000000",
  "salesforceExternalId": "00525000008yPMSCC9",
  "salesforceInvocationSucceeded": "true",
  "camel.messageId": "F577AC290CF4B5D-0000000000000000",
  "salesforceMigrateAccountsResponseBody": "{\"allOrNone\":true,\"compositeRequest\":[{\"body\":{\"FirstName\":\"George Michael\",\"LanguageLocaleKey__pc\":\"en-GB\",\"LastName\":\"Unknown\",\"PersonEmail\": \"******\",\"PersonIndividualId\":\"00525000008yPMSCC9\",\"PersonMobilePhone\": \"******\",\"REIDP_ForgeRock_Identity__pc\":\"true\"},\"method\":\"PATCH\",\"referenceId\":\"referenceId0012500001i9vb3AAA\",\"url\":\"/services/data/v54.0/sobjects/Account/0012500001i9vb3AAA\"}]}",
  "salesforceMigrateIndividualSfResponseCode": "201",
  "salesforceMigrateAccountsRequestBody": "{\"allOrNone\":true,\"compositeRequest\":[{\"body\":{\"FirstName\":\"George Michael\",\"LanguageLocaleKey__pc\":\"en-GB\",\"LastName\":\"Unknown\",\"PersonEmail\": \"******\",\"PersonIndividualId\":\"00525000008yPMSCC9\",\"PersonMobilePhone\": \"******\",\"REIDP_ForgeRock_Identity__pc\":\"true\"},\"method\":\"PATCH\",\"referenceId\":\"referenceId0012500001i9vb3AAA\",\"url\":\"/services/data/v54.0/sobjects/Account/0012500001i9vb3AAA\"}]}",
  "processingTimeInMillis": "294"
}

Notice the masked out fields which are:

\"mail\": \"******\"
\"REIDP_Email__c\": \"******\"
\"PersonEmail\": \"******\"
\"telephoneNumber\": \"******\"
\"PersonMobilePhone\": \"******\"
\"REIDP_Mobilephone__c\": \"******\"

But when we look at the corresponding trace logs of AppInsights in portal they are not masked out and show the sensitive data we are actually trying to mask out!

The corresponding logback-spring.xml of the spring-boot application is as the following (the used spring-boot profile at runtime is the default profile and not local):

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <springProfile name="local">
        <include resource="org/springframework/boot/logging/logback/defaults.xml"/>
        <include resource="org/springframework/boot/logging/logback/console-appender.xml"/>
        <root level="INFO">
            <appender-ref ref="CONSOLE"/>
        </root>
    </springProfile>

    <springProfile name="!local">
        <appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
            <encoder class="net.logstash.logback.encoder.LogstashEncoder">
                <jsonGeneratorDecorator class="net.logstash.logback.mask.MaskingJsonGeneratorDecorator">
                    <valueMask>
                        <value>"mail"\s*:\s*"[^"]+"</value>
                        <mask>"mail":"******"</mask>
                    </valueMask>
                    <valueMask>
                        <value>"REIDP_Email__c"\s*:\s*"[^"]+"</value>
                        <mask>"REIDP_Email__c":"******"</mask>
                    </valueMask>
                    <valueMask>
                        <value>"PersonEmail"\s*:\s*"[^"]+"</value>
                        <mask>"PersonEmail":"******"</mask>
                    </valueMask>
                    <valueMask>
                        <value>"telephoneNumber"\s*:\s*"[^"]+"</value>
                        <mask>"telephoneNumber":"******"</mask>
                    </valueMask>
                    <valueMask>
                        <value>"REIDP_Mobilephone__c"\s*:\s*"[^"]+"</value>
                        <mask>"REIDP_Mobilephone__c":"******"</mask>
                    </valueMask>
                    <valueMask>
                        <value>"PersonMobilePhone"\s*:\s*"[^"]+"</value>
                        <mask>"PersonMobilePhone":"******"</mask>
                    </valueMask>
                </jsonGeneratorDecorator>
            </encoder>
        </appender>
        <root level="INFO">
            <appender-ref ref="JSON"/>
        </root>
    </springProfile>
</configuration>

So to says from a timeline perspective, app-insights seems to infer the written logs after the encoder tag by the configuration above but before the jsonGeneratorDecorator tag.

To Reproduce

Steps to reproduce the behavior:

Sample Application

If applicable, provide a sample application which reproduces the issue.

System information

Please provide the following information:

Logs

Turn on SDK logs and attach/paste them to the issue. If using an application server, also attach any relevant server logs.

The Turn on SDK logs link above doesn't work for me.

Screenshots

If applicable, add screenshots to help explain your problem.

richorama commented 2 years ago

@bvahdat was asking me if this has been resolved. I understand that #6541 addresses sanitization, and wondered if it resolves this issue?

srferron commented 2 years ago

Same problem here, the app-insights doesn´t take the mask logs. I can see they are mask on pod logs but unmasked on app-insights

Thanks

sahil-goel commented 11 months ago

Hi, I am facing the same problem. Is this solved please?

heyams commented 11 months ago

@sahil-goel @srferron @richorama @bvahdat Sorry for taking so long to get to this issue.

Can you try our TelemetryProcessor preview feature to mask sensitive data in log message: Here is a sample of the json config.

The whole test is #3408. Please let me know if this works for you and ping me if you have further questions.

Essentially, sensitive data can be extracted out as a Log attribute and then delete it on the fly. It will have {redactedKeyName} instead of "*****".

If this still doesn't meet your need, I can try to apply mask feature to log body. That will require feature improvement and a new release.

sahil-goel commented 11 months ago

@heymas - what you explained don't really help

I have 100 microservices running in a AKS cluster and each connecting with the same application insights. Every microservice has its own configuration on what to mask in the logs. As of now, things are getting masked in AKS logs as per microservice configuration but when it comes to Application Insights logs, the fields are not masked.

Application insights.json is a centralised configuration and is not per microservice.

Ideally, I would want logs to be exactly same in AKS and App Insights

heyams commented 11 months ago

@sahil-goel how do you attach Java agent? at the microservice level or AKS cluster level?

heyams commented 11 months ago

this is related to https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/5969

heyams commented 11 months ago

https://github.com/open-telemetry/opentelemetry-java/issues/5187#issuecomment-1689923356

logstash-logback-encoder library offers some powerful configuration. can you give it a try?

microsoft-github-policy-service[bot] commented 10 months ago

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.