vmware-tanzu / pinniped

Pinniped is the easy, secure way to log in to your Kubernetes clusters.
https://pinniped.dev
Apache License 2.0
527 stars 64 forks source link

WIP: audit logging #2009

Open cfryanr opened 2 weeks ago

cfryanr commented 2 weeks ago

WIP for implementing audit logging. Early draft for proof of concept and discussion.

This draft differs from the proposal document, which is over two years old. In that doc, I proposed that we do a bunch of extra work to make a fancy sidecar container that would allow us to seperate the audit logs from the regular error/warning/debug/info/trace logs, to make it easier for admin users to export their audit logs seperately. However, much time has passed since then and I no longer believe that is nessessary. Fluentbit and other log exporting tools have improved since then. Fluentbit now includes tools which allow you to separate lines from a pod log and send those special lines to a different destination (example of that shown below).

Instead, this PR outputs the new audit log events into the regular pod log, and makes them easy to grep by always including "auditEvent":true on every log line which is for an audit event.

The key/values in audit event logs are:

Other changes compared to the original proposal doc:

Note that aggregated API endpoints, such as the Concierge's TokenCredentialRequest, already support Kubernetes audit logging. Additionally, they already have some trace logging using the k8s.io/utils/trace package, but those logs only appear in our pod logs if the user has configured the log level to be info or higher (not by default). If we add our own audit logging to these endpoints, then ideally the auditID would match that shown in the Kubernetes audit logs to allow for correlation between the pod logs and the audit logs. The approach taken in this PR works that way.

Here is an example Pinniped audit log event from a Concierge pod for a TokenCredentialRequest. Note that this has been piped through jq for pretty printing purposes.

{
  "level": "info",
  "timestamp": "2024-07-10T20:03:26.164470Z",
  "caller": "go.pinniped.dev/internal/registry/credentialrequest/rest.go:135$credentialrequest.(*REST).Create",
  "message": "TokenCredentialRequest",
  "auditID": "fdbd324a-0691-4425-9174-5279d799fb15",
  "auditEvent": true,
  "username": "ldap:pinny.ldap@example.com",
  "groups": [
    "ldap:ball-admins",
    "ldap:ball-game-players"
  ],
  "authenticated": true,
  "expires": "2024-07-10T20:08:26Z"
}

And here are the two audit log events from the Kubernetes API audit log for the same TokenCredentialRequest request, shown for when Kube API audit logging is configured to the metadata level. They show that this was an anonymous request, but they have no way of knowing how Pinniped resolved the identity of the user (which is shown by the Pinniped audit log above). Note that they have the same unique audit ID as above, even though they are not found in the same log file. Note that these have been piped through jq for pretty printing purposes.

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "auditID": "fdbd324a-0691-4425-9174-5279d799fb15",
  "stage": "RequestReceived",
  "requestURI": "/apis/login.concierge.pinniped.dev/v1alpha1/tokencredentialrequests",
  "verb": "create",
  "user": {
    "username": "system:anonymous",
    "groups": [
      "system:unauthenticated"
    ]
  },
  "sourceIPs": [
    "172.18.0.1"
  ],
  "userAgent": "pinniped/v0.0.0 (darwin/arm64) kubernetes/$Format",
  "objectRef": {
    "resource": "tokencredentialrequests",
    "apiGroup": "login.concierge.pinniped.dev",
    "apiVersion": "v1alpha1"
  },
  "requestReceivedTimestamp": "2024-07-10T20:03:26.155574Z",
  "stageTimestamp": "2024-07-10T20:03:26.155574Z"
}

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "auditID": "fdbd324a-0691-4425-9174-5279d799fb15",
  "stage": "ResponseComplete",
  "requestURI": "/apis/login.concierge.pinniped.dev/v1alpha1/tokencredentialrequests",
  "verb": "create",
  "user": {
    "username": "system:anonymous",
    "groups": [
      "system:unauthenticated"
    ]
  },
  "sourceIPs": [
    "172.18.0.1"
  ],
  "userAgent": "pinniped/v0.0.0 (darwin/arm64) kubernetes/$Format",
  "objectRef": {
    "resource": "tokencredentialrequests",
    "apiGroup": "login.concierge.pinniped.dev",
    "apiVersion": "v1alpha1"
  },
  "responseStatus": {
    "metadata": {},
    "code": 201
  },
  "requestReceivedTimestamp": "2024-07-10T20:03:26.155574Z",
  "stageTimestamp": "2024-07-10T20:03:26.165041Z",
  "annotations": {
    "authorization.k8s.io/decision": "allow",
    "authorization.k8s.io/reason": "RBAC: allowed by ClusterRoleBinding \"pinniped-concierge-pre-authn-apis\" of ClusterRole \"pinniped-concierge-pre-authn-apis\" to Group \"system:unauthenticated\""
  }
}

Note that for manual testing, Kubernetes API audit logs can be enabled in Kind clusters: https://kind.sigs.k8s.io/docs/user/auditing.

Here is an example snippet from a custom values.yaml for the Fluentbit Helm Chart showing how to route Pinniped audit logs to a different destination. This example is not for production use because the output destination is stdout. See comments below for more info.

  ## https://docs.fluentbit.io/manual/pipeline/filters
  filters: |
    # This filter is part of the default configuration from the Helm chart.
    [FILTER]
        Name kubernetes
        Match kube.*
        Merge_Log On
        Keep_Log Off
        K8S-Logging.Parser On
        K8S-Logging.Exclude On

    [FILTER]
        # Just to make this experiment's output easier to read, exclude any logs that came from the fluent-bit pods.
        # Otherwise we see each event multiple times because we are using stdout outputs below.
        # This would not be needed in a production setup where logs are sent elsewhere, not to stdout.
        Name grep
        Match kube.*
        Exclude $kubernetes['labels']['app.kubernetes.io/name'] ^fluent-bit$

    # A default fluent-bit configuration will capture logs from all pods, so it will capture Pinniped's
    # mixed pod logs that contain audit events and also other logs.
    # This example shows that fluent-bit could be configured to move the audit logs to a separate output
    # if desired. There are many ways to customize fluent-bit's logging pipeline, and it is quite flexible.
    # The example below removes Pinniped audit log lines from the exported Pinniped pods' logs
    # and captures those audit logs instead as one new output, mixing Supervisor and Concierge logs.
    # To determine which pods are Pinniped pods, this example looks at pod labels. Alternatively, it could
    # look at pod namespaces other other properties.
    # Although not shown here, it could alternatively be written to avoid removing audit logs records
    # from the exported pod logs, and could be written to separate Concierge audit logs from Supervisor
    # audit logs.

    [FILTER]
        # For all logs from all pods labelled like a Pinniped Supervisor pod,
        # or labelled like a Pinniped Concierge pod,
        # make a copy of the record at a new tag called pinniped-audit.
        # Keep the record in its original tag also.
        # At this point, the pinniped-audit tag will have all pod logs from
        # matching pods, not just audit logs. We will filter the audit logs below.
        # Multiple Rule statements act like an "or".
        # See https://docs.fluentbit.io/manual/pipeline/filters/rewrite-tag
        Name rewrite_tag
        Match kube.*
        Rule $kubernetes['labels']['app'] ^pinniped-supervisor$ pinniped-audit true
        Rule $kubernetes['labels']['app'] ^pinniped-concierge$ pinniped-audit true

    [FILTER]
        # For all log records that were copied into this new pinniped-audit tag,
        # only keep the events in the new tag that appear to be an audit log.
        # https://docs.fluentbit.io/manual/pipeline/filters/grep
        Name grep
        Match pinniped-audit
        Regex auditEvent ^true$

    [FILTER]
        # If desired, another grep filter can remove the Pinniped Supervisor audit logs from the original tag,
        # so they only appear in events of the new pinniped-audit tag, rather than appearing in both places.
        # See https://docs.fluentbit.io/manual/pipeline/filters/grep
        Name grep
        Match kube.*
        Logical_Op and
        Exclude $kubernetes['labels']['app'] ^pinniped-supervisor$
        Exclude auditEvent ^true$

    [FILTER]
        # If desired, another grep filter can remove the Pinniped Concierge audit logs from the original tag,
        # so they only appear in events of the new pinniped-audit tag, rather than appearing in both places.
        # See https://docs.fluentbit.io/manual/pipeline/filters/grep
        Name grep
        Match kube.*
        Logical_Op and
        Exclude $kubernetes['labels']['app'] ^pinniped-concierge$
        Exclude auditEvent ^true$

  ## https://docs.fluentbit.io/manual/pipeline/outputs
  outputs: |
    #[OUTPUT]
    #    Name stdout
    #    Match kube.*

    #[OUTPUT]
    #    Name stdout
    #    Match host.*

    [OUTPUT]
        # By the time a log record gets to the output, it was already filtered.
        # Any event with this tag is only a Pinniped audit log event, and we can
        # output it to anywhere we prefer.
        Name stdout
        Match pinniped-audit
        Format json_lines

Release note:

Audit logging will be a user-facing feature for admin users and needs release notes and documentation.

TBD
codecov[bot] commented 1 week ago

Codecov Report

Attention: Patch coverage is 80.71895% with 59 lines in your changes missing coverage. Please review.

Project coverage is 30.89%. Comparing base (dd80627) to head (b2873cf).

Files Patch % Lines
internal/plog/plog.go 0.00% 19 Missing :warning:
...tiondomain/downstreamsession/downstream_session.go 0.00% 18 Missing :warning:
.../controller/supervisorstorage/garbage_collector.go 80.39% 8 Missing and 2 partials :warning:
...l/federationdomain/requestlogger/request_logger.go 92.30% 6 Missing :warning:
internal/supervisor/server/server.go 0.00% 4 Missing :warning:
internal/concierge/apiserver/apiserver.go 0.00% 1 Missing :warning:
.../federationdomain/endpoints/token/token_handler.go 96.55% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #2009 +/- ## ========================================== + Coverage 30.69% 30.89% +0.20% ========================================== Files 365 367 +2 Lines 60616 60861 +245 ========================================== + Hits 18609 18806 +197 - Misses 41470 41516 +46 - Partials 537 539 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.