Decision logging currently disabled

maltesander commented 1 year ago

In the OPA server config.yaml we can activate logging decisions to the console (see https://www.openpolicyagent.org/docs/latest/configuration/#decision-logs), which in turn would be picked up by vector.

services:
  - name: stackable
    url: http://localhost:3030/opa/v1

bundles:
  stackable:
    service: stackable
    resource: opa/bundle.tar.gz
    persist: true
    polling:
      min_delay_seconds: 10
      max_delay_seconds: 20

decision_logs:
    console: true

We must make this configurable via the CRD.

Proposal

Handling decision logging configuration with the current logging solution described here.

There is already a decision logger implemented for decision logs, which can be utilized here to specify the log level for the decision logs. Log level NONE for no decision logs at all and any other level for enabling them.
Specifying the log level of the appenders (console or file) can control whether the decision logs would be visible on console, file, none or both.

Example:

logging:                   
    enableVectorAgent: true  
    containers:              
      main-container:
        console:             
          level: INFO # Default: INFO
        file:
          level: INFO # Default: INFO
        loggers:             
          ROOT:
            level: INFO # Default: INFO
          decision:
            level: INFO # Default: NONE
          server:
            level: WARN # Default: null

This would add the decision logs both to console and file. Additionally server logs are printed there as well.

The difference to the prior solution suggestion is that the different logging configurations (decisionLogging and logging), which might affect each other (for example setting decisionLogging.console to true but logging.main-container.console.level to NONE), are less apart and part of the same overall concept, which might reduce confusion when configuring logging. Furthermore, there would be no CRD change in the scope of this issue necessary.

Implementation Considerations

OPA is printing logs in a json format including a level field for log level. Filtering logs only with grep is not sufficient. It would be better to parse and filter with jq. For this we would add a script/tool to the OPA image (docker-images/opa/stackable/bin) -> jq needs to be added to the docker image
Default values suggestion: Everything set to INFO and decision logging to NONE

Workaround

Stop reconciling the OpaCluster using `spec.clusterOperation.reconciliationPaused: true
Manually edit the Configmap containing the OPA config and add
```
decision_logs:
console: true
```
Restart OPA Pods

[ ] The end-to-end-security demo is updated (in case it makes sense)
[ ] Adjust Trino integration-test
```
### Tasks
```
[ ] https://github.com/stackabletech/docker-images/pull/695
[ ] https://github.com/stackabletech/opa-operator/pull/555

sbernauer commented 1 year ago

Update: All the consumers (druid and trino opa authorizer) can now handle this. Question is if we want to log audit (decision) logs to stdout or if we want to consider this sensitive information. In this case we e.g. want to log to a file and collect that using vector. On the other hand when developing rego rules you need that log, stdout would be convenient in that case

fhennig commented 6 months ago

Maybe there could be a switch for file/stdout? I think both use cases are very valid.

xeniape commented 4 months ago

Refinement Notes

Suggested Steps:

Add configuration parameter to CRD
The customer needs to know, that sensible information might be printed to the console, when choosing console as output. Add a warning in the documentation.
Extend commons-operator to restart DaemonSets on ConfigMap changes. Set according label (restarter.stackable.tech/enabled) on DaemonSet of OpaCluster.

CRD-proposal:

spec:
  servers:
    config: # also in .spec.servers.roleGroups.*.config.decisionLogging
      decisionLogging: # additional field since logging is used for all products
        console: true/false
        url: ... # not part of the ticket, see 'insights into alternatives' for more details

Insights into alternatives:

Currently it seems not possible to route the decision logs to a file, see related Issue here: https://github.com/open-policy-agent/opa/issues/2605
It should be possible to forward the decision logs to a remote server (using the service field). It probably should also be possible to forward them to the vector agent, but then there would be no output to stderr using this method and a pipe to a syslog-Logger would not work. Syslog-Logger-Piping is out of scope for this issue.
One can set both console and service (remote server) output for decision logging. That's why the CRD fields were chosen like above for possible future extensions.

Security Considerations:

Username, group, and decision result are part of the response -> If the response returns "result: true", one could infer user permissions from the logs.

Other:

The request id req_id is continuously incremented across server and decision logs. Meaning, it wouldn't be enough to just send decision logs to SIEM to check for deleted logs.

Techassi commented 4 months ago

Regarding alternatives / future improvements: How about exporting the decision logs via OLTP to an OpenTelemetry collector? Maybe OPA already has native support for that?

Can this maybe be added to the refinement above?

NickLarsenNZ commented 4 months ago

I had a quick look at OPA options, and there isn't OTLP (without a custom service). So that might come later.

I would vote against syslog in the long-term (eg: when/if OTLP is supported + OpenTelemetry Semantic Conventions), but I guess as a short-term solution to get "something" to Vector (since it should support a syslog receiver), that should be ok.

[!NOTE] For clarity, OPA does support OTel, but it seems just for traces and not decision logs.

fhennig commented 4 months ago

Looks good! I'm happy about keeping the scope about console logging for now, as that would already be very useful when initially setting up rules, or during testing.

sbernauer commented 4 months ago

I'm also fine with only implementing console logging for now, but I would prefer to wait with closing this Issue when we implemented logging so that it works with our logging strategy. (the console logging part is trivial and we finally got the prio to work on this - hence we should do it properly as well).

But that's just my 2 cents, we could also create a new Issue, but for me this would just bloat things up.

xeniape commented 4 months ago

Moving to voting.

sbernauer commented 4 months ago

What are we voting on? Only the decisionLogging.console boolean? - That LGTM. Are there any opinions on https://github.com/stackabletech/opa-operator/issues/422#issuecomment-2063338552?

siegfriedweber commented 4 months ago

I'm also fine with only implementing console logging for now, but I would prefer to wait with closing this Issue when we implemented logging so that it works with our logging strategy.

The property decisionLogging.console is obviously misleading.

This is how logging in OPA works: In principal, OPA logs to the standard error stream. tee and multilog are used to also redirect this stream into a log file. This log file is read by the Vector agent. If decisionLogging.console is set to true then the decision logs are additionally logged to the standard error stream. So, if enabled, the decision logs are already processed by our logging solution.

The reason to name this property console comes from the OPA configuration. The decision logs can be logged to the console or to a remote server (OPA configuration: decision_logs.service). As the decision logs contain sensitive data, it could make sense to directly forward them to another server or to the Vector agent with the http_server source. So, there could be another property (e.g. url) alongside the console property in the future.

The current proposal with console works for the users who enabled Vector as well as for the ones who did not enable it.

Any proposals for a better wording?

maltesander commented 4 months ago

Agree with Siggi, console already works with our logging solution. Question is if we want that or not or should make it more explicit...

What about something more explicit like:

decisionLogging:
  enabled: true # default false, if enabled we log decisions. Setting only this would be the console = true part.
  httpServer/sink/url: ... # optional, if set we send the decison logs to a service which would leave out the vector part?

Just a proposal, not to keen about that myself.

Edit: From the looks of it its somehow supported by the vector transforms as well https://github.com/stackabletech/operator-rs/blob/0e4076b5b3d66a3cfe0d6024b5be85608d9d63ca/crates/stackable-operator/src/product_logging/framework.rs#L791

siegfriedweber commented 4 months ago

Removed from the decisions board to come up with a better solution

stefanigel commented 4 months ago

I do not want to extend the scope, but could we think of sth. to get the PII out of the logs, e.g. by pseudonymisation?

xeniape commented 4 months ago

Further Refinement and Solution Suggestion

Proposal

Handling decision logging configuration with the current logging solution described here.

There is already a decision logger implemented for decision logs, which can be utilized here to specify the log level for the decision logs. Log level NONE for no decision logs at all and any other level for enabling them.
Specifying the log level of the appenders (console or file) can control whether the decision logs would be visible on console, file, none or both.

Example:

logging:                   
    enableVectorAgent: true  
    containers:              
      main-container:
        console:             
          level: INFO # Default: INFO
        file:
          level: INFO # Default: INFO
        loggers:             
          ROOT:
            level: INFO # Default: INFO
          decision:
            level: INFO # Default: NONE
          server:
            level: WARN # Default: null

This would add the decision logs both to console and file. Additionally server logs are printed there as well.

The difference to the prior solution suggestion is that the different logging configurations (decisionLogging and logging), which might affect each other (for example setting decisionLogging.console to true but logging.main-container.console.level to NONE), are less apart and part of the same overall concept, which might reduce confusion when configuring logging. Furthermore, there would be no CRD change in the scope of this issue necessary.

Implementation Considerations

OPA is printing logs in a json format including a level field for log level. Filtering logs only with grep is not sufficient. It would be better to parse and filter with jq. For this we would add a script/tool to the OPA image (docker-images/opa/stackable/bin) -> jq needs to be added to the docker image
Default values suggestion: Everything set to INFO and decision logging to NONE

Not in scope of this issue but informative:

Complications with this approach

Since the log level configuration on the appenders includes both server and decision logs, currently, there would be no possibility to define more fine-grained logging routing for certain logger types. For example, printing server logs to both console and file while also printing decision logs only to file would be not possible. But this could be enabled with the following extension suggestion, which would not be part of this issue.
Extension suggestion for operator-rs: The implementation for loggers could be extended to be able to define a single or multiple appenderRefs to appenders defined in the container.

Example:

logging:                   
    enableVectorAgent: true  
    containers:              
      main-container:
        console:             
          level: INFO
        file:
          level: INFO
        loggers:             
          ROOT:
            level: INFO
          decision:
            level: INFO
            appender: [file]
          server:
            level: WARN

With this setting decision logs would only be send to the file appender and would not appear in the console.

siegfriedweber commented 4 months ago

I do not want to extend the scope, but could we think of sth. to get the PII out of the logs, e.g. by pseudonymisation?

There are server logs which do not contain personally identifiable information, and there are decision logs which contain the identity of the requester as well as the requested resource. Both, the identity and the resource, contain sensible information. If the log target should not contain sensible information, then the decision logs must be turned off entirely. But if the log target is a security information and event management (SIEM) system then it is crucial that the logs contain the information about who tried what. So the proper solution is, to turn off the console output, use the log agent to collect and forward the logs to the log aggregator, and configure the aggregator to filter, anonymize, and forward the logs to the logging systems. For instance, the log aggregator could forward the logs (as they are) to the SIEM system, and also filter or anonymize these logs and forward them to the regular logging system.

stackabletech / opa-operator