stackabletech / opa-operator

A kubernetes operator for the Open Policy Agent
Other
15 stars 3 forks source link

Decision logging currently disabled #422

Closed maltesander closed 2 months ago

maltesander commented 1 year ago

In the OPA server config.yaml we can activate logging decisions to the console (see https://www.openpolicyagent.org/docs/latest/configuration/#decision-logs), which in turn would be picked up by vector.

services:
  - name: stackable
    url: http://localhost:3030/opa/v1

bundles:
  stackable:
    service: stackable
    resource: opa/bundle.tar.gz
    persist: true
    polling:
      min_delay_seconds: 10
      max_delay_seconds: 20

decision_logs:
    console: true

We must make this configurable via the CRD.

Proposal

Handling decision logging configuration with the current logging solution described here.

Example:

logging:                   
    enableVectorAgent: true  
    containers:              
      main-container:
        console:             
          level: INFO # Default: INFO
        file:
          level: INFO # Default: INFO
        loggers:             
          ROOT:
            level: INFO # Default: INFO
          decision:
            level: INFO # Default: NONE
          server:
            level: WARN # Default: null

This would add the decision logs both to console and file. Additionally server logs are printed there as well.

The difference to the prior solution suggestion is that the different logging configurations (decisionLogging and logging), which might affect each other (for example setting decisionLogging.console to true but logging.main-container.console.level to NONE), are less apart and part of the same overall concept, which might reduce confusion when configuring logging. Furthermore, there would be no CRD change in the scope of this issue necessary.

Implementation Considerations

Workaround

  1. Stop reconciling the OpaCluster using `spec.clusterOperation.reconciliationPaused: true
  2. Manually edit the Configmap containing the OPA config and add
    decision_logs:
    console: true
  3. Restart OPA Pods
sbernauer commented 1 year ago

Update: All the consumers (druid and trino opa authorizer) can now handle this. Question is if we want to log audit (decision) logs to stdout or if we want to consider this sensitive information. In this case we e.g. want to log to a file and collect that using vector. On the other hand when developing rego rules you need that log, stdout would be convenient in that case

fhennig commented 6 months ago

Maybe there could be a switch for file/stdout? I think both use cases are very valid.

xeniape commented 4 months ago

Refinement Notes


Suggested Steps:

CRD-proposal:

spec:
  servers:
    config: # also in .spec.servers.roleGroups.*.config.decisionLogging
      decisionLogging: # additional field since logging is used for all products
        console: true/false
        url: ... # not part of the ticket, see 'insights into alternatives' for more details

Insights into alternatives:

Security Considerations:

Other:

Techassi commented 4 months ago

Regarding alternatives / future improvements: How about exporting the decision logs via OLTP to an OpenTelemetry collector? Maybe OPA already has native support for that?

Can this maybe be added to the refinement above?

NickLarsenNZ commented 4 months ago

I had a quick look at OPA options, and there isn't OTLP (without a custom service). So that might come later.

I would vote against syslog in the long-term (eg: when/if OTLP is supported + OpenTelemetry Semantic Conventions), but I guess as a short-term solution to get "something" to Vector (since it should support a syslog receiver), that should be ok.

[!NOTE] For clarity, OPA does support OTel, but it seems just for traces and not decision logs.

fhennig commented 4 months ago

Looks good! I'm happy about keeping the scope about console logging for now, as that would already be very useful when initially setting up rules, or during testing.

sbernauer commented 4 months ago

I'm also fine with only implementing console logging for now, but I would prefer to wait with closing this Issue when we implemented logging so that it works with our logging strategy. (the console logging part is trivial and we finally got the prio to work on this - hence we should do it properly as well).

But that's just my 2 cents, we could also create a new Issue, but for me this would just bloat things up.

xeniape commented 4 months ago

Moving to voting.

sbernauer commented 4 months ago

What are we voting on? Only the decisionLogging.console boolean? - That LGTM. Are there any opinions on https://github.com/stackabletech/opa-operator/issues/422#issuecomment-2063338552?

siegfriedweber commented 4 months ago

I'm also fine with only implementing console logging for now, but I would prefer to wait with closing this Issue when we implemented logging so that it works with our logging strategy.

The property decisionLogging.console is obviously misleading.

This is how logging in OPA works: In principal, OPA logs to the standard error stream. tee and multilog are used to also redirect this stream into a log file. This log file is read by the Vector agent. If decisionLogging.console is set to true then the decision logs are additionally logged to the standard error stream. So, if enabled, the decision logs are already processed by our logging solution.

The reason to name this property console comes from the OPA configuration. The decision logs can be logged to the console or to a remote server (OPA configuration: decision_logs.service). As the decision logs contain sensitive data, it could make sense to directly forward them to another server or to the Vector agent with the http_server source. So, there could be another property (e.g. url) alongside the console property in the future.

The current proposal with console works for the users who enabled Vector as well as for the ones who did not enable it.

Any proposals for a better wording?

maltesander commented 4 months ago

Agree with Siggi, console already works with our logging solution. Question is if we want that or not or should make it more explicit...

What about something more explicit like:

decisionLogging:
  enabled: true # default false, if enabled we log decisions. Setting only this would be the console = true part.
  httpServer/sink/url: ... # optional, if set we send the decison logs to a service which would leave out the vector part?

Just a proposal, not to keen about that myself.

Edit: From the looks of it its somehow supported by the vector transforms as well https://github.com/stackabletech/operator-rs/blob/0e4076b5b3d66a3cfe0d6024b5be85608d9d63ca/crates/stackable-operator/src/product_logging/framework.rs#L791

siegfriedweber commented 4 months ago

Removed from the decisions board to come up with a better solution

stefanigel commented 4 months ago

I do not want to extend the scope, but could we think of sth. to get the PII out of the logs, e.g. by pseudonymisation?

xeniape commented 4 months ago

Further Refinement and Solution Suggestion

Proposal

Handling decision logging configuration with the current logging solution described here.

Example:

logging:                   
    enableVectorAgent: true  
    containers:              
      main-container:
        console:             
          level: INFO # Default: INFO
        file:
          level: INFO # Default: INFO
        loggers:             
          ROOT:
            level: INFO # Default: INFO
          decision:
            level: INFO # Default: NONE
          server:
            level: WARN # Default: null

This would add the decision logs both to console and file. Additionally server logs are printed there as well.

The difference to the prior solution suggestion is that the different logging configurations (decisionLogging and logging), which might affect each other (for example setting decisionLogging.console to true but logging.main-container.console.level to NONE), are less apart and part of the same overall concept, which might reduce confusion when configuring logging. Furthermore, there would be no CRD change in the scope of this issue necessary.

Implementation Considerations

Not in scope of this issue but informative:

Complications with this approach

Example:

logging:                   
    enableVectorAgent: true  
    containers:              
      main-container:
        console:             
          level: INFO
        file:
          level: INFO
        loggers:             
          ROOT:
            level: INFO
          decision:
            level: INFO
            appender: [file]
          server:
            level: WARN

With this setting decision logs would only be send to the file appender and would not appear in the console.

siegfriedweber commented 4 months ago

I do not want to extend the scope, but could we think of sth. to get the PII out of the logs, e.g. by pseudonymisation?

There are server logs which do not contain personally identifiable information, and there are decision logs which contain the identity of the requester as well as the requested resource. Both, the identity and the resource, contain sensible information. If the log target should not contain sensible information, then the decision logs must be turned off entirely. But if the log target is a security information and event management (SIEM) system then it is crucial that the logs contain the information about who tried what. So the proper solution is, to turn off the console output, use the log agent to collect and forward the logs to the log aggregator, and configure the aggregator to filter, anonymize, and forward the logs to the logging systems. For instance, the log aggregator could forward the logs (as they are) to the SIEM system, and also filter or anonymize these logs and forward them to the regular logging system.