Recording profiles with the log enricher

saschagrunert commented 3 years ago

What would you like to be added:

It should be possible to record profiles by using the log enricher, while preserving a good usability.

Why is this needed:

Right now the operator supports recording profiles by using the ProfileRecording CRD as well as the OCI seccomp BPF hook. This requires CRI-O as well as the hook to be pre-installed and configured on every node running the recording.

However, the log enricher supports CRI-O and containerd as well as relies only on auditd. Adding recording support to the enricher may conflict with the existing recording feature and should seamless integrate into the operator. What we could do, is:

Add a new field to the ProfileRecordingSpec to indicate that we either want to record by using the hook or by using the logs
Based on the new field, a Recording CRD would have to ship with an audit log profile
Reuse the existing recording webhook to add the audit log profile to the target pod (error if there is a profile already added by the user)
Let the enricher take the recording profile into account to stream the syscalls to the GRPC server
Let the GRPC server collect the syscalls per container/pod until the recording gets deleted
Reconcile a SeccompProfile CRD on recording deletion

User story covered

As an infrastructure SRE, Kirsti wants to be able to easily correlate audit log entries related to security profiles to specific containers/pods. (link)

jhrozek commented 3 years ago

I think this would work well. I was thinking recently about the recording use-case for SELinux and I had a similar flow in mind. Some comments inline -- they mostly focus on the SELinux case, so hopefully I'm not hijacking the ticket too much.

What would you like to be added:

It should be possible to record profiles by using the log enricher, while preserving a good usability.

Why is this needed:

Right now the operator supports recording profiles by using the ProfileRecording CRD as well as the OCI seccomp BPF hook. This requires CRI-O as well as the hook to be pre-installed and configured on every node running the recording.

However, the log enricher supports CRI-O and containerd as well as relies only on auditd. Adding recording support to the enricher may conflict with the existing recording feature and should seamless integrate into the operator. What we could do, is:
* Add a new field to the [`ProfileRecordingSpec`](https://github.com/kubernetes-sigs/security-profiles-operator/blob/d4ee3f9a931763072f1a7a89bc3ad448b5911ccf/api/profilerecording/v1alpha1/profilerecording_types.go#L30) to indicate that we either want to record by using the hook or by using the logs

I would suggest two pieces actually: what kind of profile is to be recorded and using what mechanism. The type could specify SELinux, seccomp (or in future apparmor?) or a combination. I wonder if the cleanest API might be to add an array of "Recorders", something like:

type ProfileRecordingSpec struct {
    Kind ProfileRecordingKind `json:"kind"`
        Recorder []ProfileRecorder // could be just a NamedObjectReference?
    PodSelector metav1.LabelSelector `json:"podSelector"`
}

and:

type ProfileRecorder struct {
        ProfileKind string // seccomp, selinux, ...
        RecorderKind string // audit, BPF, ...
}

The recorders could be predefined by the operator (maybe based on what the platform supports) so that the user just links the profilerecording with the recorders and off they go.

* Based on the new field, a Recording CRD would have to ship with an [audit log profile](https://github.com/kubernetes-sigs/security-profiles-operator/blob/d4ee3f9a931763072f1a7a89bc3ad448b5911ccf/examples/seccompprofile.yaml#L11-L18)

Sorry, can you clarify? I understand that the workload under recording must be using a profile that has the defaultAction ACT_LOG, but I don't understand the point above. Isn't it enough to ensure that the audit profile exists before adding the profile as you say below?

* Reuse the existing recording webhook to add the audit log profile to the target pod (error if there is a profile already added by the user)

Ack

* Let the enricher take the recording profile into account to stream the syscalls to the GRPC server

* Let the GRPC server collect the syscalls per container/pod until the recording gets deleted

I admit I haven't looked at the GRPC server or its usage that much, but on the surface this looks good.

* Reconcile a SeccompProfile CRD on recording deletion

This is probably my ignorance of how the GRPC server works, but would the recorder then fetch the syscalls from the server when the recording is deleted?

User story covered

As an infrastructure SRE, Kirsti wants to be able to easily correlate audit log entries related to security profiles to specific containers/pods. (link)

jhrozek commented 3 years ago

For the SELinux recording I had this in mind:

create a temporary permissive label for the workload
similar to how the audit seccomp profile is taken into account above in the webhook, take the temporary label into account
gather the AVC denials in the enricher
convert them into policy once the recording goes away

saschagrunert commented 3 years ago

@jhrozek thank you for the input, I appreciate it!

I would suggest two pieces actually: what kind of profile is to be recorded and using what mechanism. The type could specify SELinux, seccomp (or in future apparmor?) or a combination. I wonder if the cleanest API might be to add an array of "Recorders", something like:
type ProfileRecordingSpec struct {
    Kind ProfileRecordingKind `json:"kind"`
    Recorder []ProfileRecorder // could be just a NamedObjectReference?
    PodSelector metav1.LabelSelector `json:"podSelector"`
}

What about the conflict between using the hook and the metrics recorder in parallel? Would it be safer to just allow one ProfileRecorder per object instance?

* Based on the new field, a Recording CRD would have to ship with an [audit log profile](https://github.com/kubernetes-sigs/security-profiles-operator/blob/d4ee3f9a931763072f1a7a89bc3ad448b5911ccf/examples/seccompprofile.yaml#L11-L18)
Sorry, can you clarify? I understand that the workload under recording must be using a profile that has the defaultAction ACT_LOG, but I don't understand the point above. Isn't it enough to ensure that the audit profile exists before adding the profile as you say below?

I think so, yes. My concern was about the "how to ship the profile"? So, do we want to:

add the logging profile to our default deployment?
create/update the profile right before the recording starts?

Maybe it's just fine to ship the profile as part of: https://github.com/kubernetes-sigs/security-profiles-operator/blob/8a9db9c64c475c0578d42d598396344dde751a04/internal/pkg/manager/spod/bindata/default_profiles.go#L33-L34

This is probably my ignorance of how the GRPC server works, but would the recorder then fetch the syscalls from the server when the recording is deleted?

I think we should trigger the profile creation when the actual workload gets deleted in the same way we do it with the hook recorder.

For the recording creation we may let the server cache the syscalls per pod and container, whereas another client (recorder reconciler) can fetch them by using a different RPC. So we would have two additional RPCs:

AddSyscall (for pod and/or container)
GetSyscalls (for pod and/or container)

If we timeout or actively remove entries from that cache is another question. :upside_down_face:

saschagrunert commented 3 years ago

Working on a PoC in https://github.com/kubernetes-sigs/security-profiles-operator/pull/513

jhrozek commented 3 years ago

@jhrozek thank you for the input, I appreciate it!
I would suggest two pieces actually: what kind of profile is to be recorded and using what mechanism. The type could specify SELinux, seccomp (or in future apparmor?) or a combination. I wonder if the cleanest API might be to add an array of "Recorders", something like:
type ProfileRecordingSpec struct {
    Kind ProfileRecordingKind `json:"kind"`
    Recorder []ProfileRecorder // could be just a NamedObjectReference?
    PodSelector metav1.LabelSelector `json:"podSelector"`
}
What about the conflict between using the hook and the metrics recorder in parallel? Would it be safer to just allow one ProfileRecorder per object instance?

Hmm, I was trying to allow recording both seccomp and selinux in the same go. I guess we could either disallow recording the same profile type with different recorders or just ask the user to use two recordings.

* Based on the new field, a Recording CRD would have to ship with an [audit log profile](https://github.com/kubernetes-sigs/security-profiles-operator/blob/d4ee3f9a931763072f1a7a89bc3ad448b5911ccf/examples/seccompprofile.yaml#L11-L18)
Sorry, can you clarify? I understand that the workload under recording must be using a profile that has the defaultAction ACT_LOG, but I don't understand the point above. Isn't it enough to ensure that the audit profile exists before adding the profile as you say below?
I think so, yes. My concern was about the "how to ship the profile"? So, do we want to:
* add the logging profile to our default deployment?

* create/update the profile right before the recording starts?
Maybe it's just fine to ship the profile as part of:

https://github.com/kubernetes-sigs/security-profiles-operator/blob/8a9db9c64c475c0578d42d598396344dde751a04/internal/pkg/manager/spod/bindata/default_profiles.go#L33-L34

Yes, that was my idea (in other words: I couldn't see any downside, seems easiest)

This is probably my ignorance of how the GRPC server works, but would the recorder then fetch the syscalls from the server when the recording is deleted?

I think we should trigger the profile creation when the actual workload gets deleted in the same way we do it with the hook recorder.

For the recording creation we may let the server cache the syscalls per pod and container, whereas another client (recorder reconciler) can fetch them by using a different RPC. So we would have two additional RPCs:
* AddSyscall (for pod and/or container)

* GetSyscalls (for pod and/or container)
If we timeout or actively remove entries from that cache is another question. upside_down_face

saschagrunert commented 3 years ago

This is done :upside_down_face:

kubernetes-sigs / security-profiles-operator