Feedback for "[RFC] Future audit changes"

This is an issue to track feedback for "[RFC] Future audit changes" as posted on the linux-audit mailing list a: https://listman.redhat.com/archives/linux-audit/2023-August/020036.html

1) Drop support for Python 2. Python 2 has lost upstream support over 3 years ago. I also can't see the viability of someone saying they need the latest audit changes for the new kernel yet they are stuck on python 2. It doesn't compute. This is also related to proposal 5 below.

Fully agree here. Let's drop Python 2 support.

2) Drop SysVinit support. I think everyone has changed to systemd at this point. This is to reduce potential maintenance.

auditd currently relies on the legacy service binary and its logic as well as initscripts to handle daemon stop (and restart) actions(1, 2). While looking at adding audit to Fedora CoreOS (https://github.com/coreos/fedora-coreos-tracker/issues/1362), Steve added another option (https://github.com/linux-audit/audit-userspace/commit/39802bffbfc62501461c916d9ccf748afdff7d94) to be able to manage the daemon without the scripts and service binary. The PR to apply that change in the Fedora package (#9) revealed that some logic from the scripts is still needed in some cases.

So I'm not sure how to move forward here.

Maybe we should add proper auditctl stop|restart commands that handle all those cases completely before we drop all initscripts?

3) This is probably the most controversial and would need careful testing: Split the audit service into 2 services: auditd and rules-load. These would be packaged in 2 different packages so that if all you want is rules-loading and are fine with events going to journald - have at it. If you want the tradition audit experience, then install the audit package which will depend on the rules package. The trick is making them automatically enabled at install. This will need testing and perhaps patches. Packagers will need to work with their distribution to update systemd presets.

If I understand https://www.freedesktop.org/software/systemd/man/journald.conf.html#Audit= and https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html#Files correctly then systemd journald is now capable of handling audit messages from the kernel on its own.

We thus only need auditd to:

Load the audit rules
Listen and forward audit logs when some specific plugins are enabled:
- https://manpages.debian.org/testing/audispd-plugins/audisp-remote.8.en.html
- https://man7.org/linux/man-pages/man8/audispd-zos-remote.8.html

According to https://manpages.debian.org/testing/audispd-plugins/audisp-remote.conf.5.en.html, the supprted transports are clear text TCP connections and KRB5 encrypted ones.

I would expect that most users would want their audit logs centralized in the system journal and would then forward the entire journal to a remote system for security / compliance.

Splitting 1 and making it the default would make sense, while keeping 2 as an option for users with more complex cases.

4) Change the definition of which events are simple (one record events) and compound (multiple records per event). Over the years syscall records were added to the simple events haphazardly. That seems to have settled down and we can redefine which are in which group. This is important because this determines when an event is complete and ready to process in ausearch,
aureport, and auparse. This should reduce future bug reports.

I don't know enough about this to have an opinion.

5) Drop functions from libaudit python bindings that have anything to do with placing and removing rules in the kernel. I'd like the API to just contain what's needed to send audit events and query kernel status. This new binding would be hand written, thus possibly breaking compatibility with the swig generated bindinsg. Not 100% sure on that, but it might be a side effect. The main idea is limit the scope to reduce maintenance and future-proof kernel/ swig changes.

Do we have known users of this API? If there are no known users, then let's focus on what's better from a maintenance and coherence perspective.

If we have known existing users, then maybe that would help guide this discussion.

6) Moratorium on new arches being supported. If someone else comes along and really shows sustained support for the audit project for a while and they want a new arch to be supported, I might consider it. Since my work on this project is now a hobby, I am not inclined to make more work for my weekends.

I would suggest focusing on architectures supported by Fedora and leaving all the others to community contributions.

7) Drop the autrace & auvirt programs. Does anyone actually use these? Can ausearch take the place of auvirt? The aim here is reduce maintenance.

I don't know enough about this to have an opinion.

Drop SysVinit support. I think everyone has changed to systemd at this point. This is to reduce potential maintenance.

auditd currently relies on the legacy service binary and its logic as well as initscripts to handle daemon stop (and restart) actions(1, 2). While looking at adding audit to Fedora CoreOS (coreos/fedora-coreos-tracker#1362), Steve added another option (39802bf) to be able to manage the daemon without the scripts and service binary. The PR to apply that change in the Fedora package (#9) revealed that some logic from the scripts is still needed in some cases.

So I'm not sure how to move forward here.

Maybe we should add proper auditctl stop|restart commands that handle all those cases completely before we drop all initscripts?

There is one last issue, the stop libexec script has the ability to wait until auditd exits. This is necessary or a restart can happen before the old one can exit. This was a bz filed by RH QE personnel. I am considering adding the logic to auditctl to do this. It already has the pid of auditd. I think watching the inode of /proc/<auditd pid> for deletion would probably do it.

This is probably the most controversial and would need careful testing: Split the audit service into 2 services: auditd and rules-load. These would be packaged in 2 different packages so that if all you want is rules-loading and are fine with events going to journald - have at it. If you want the tradition audit experience, then install the audit package which will depend on the rules package. The trick is making them automatically enabled at install. This will need testing and perhaps patches. Packagers will need to work with their distribution to update systemd presets.

If I understand https://www.freedesktop.org/software/systemd/man/journald.conf.html#Audit= and https://www.freedesktop.org/software/systemd/man/systemd-journald.service.html#Files correctly then systemd journald is now capable of handling audit messages from the kernel on its own.

It should be known that journald listens on a best effort, multicast socket that is lossy.

Auditd adds extra information. https://github.com/linux-audit/audit-documentation/wiki/SPEC-Audit-Event-Enrichment. This is necessary due to uids being transient and different on each system.

We thus only need auditd to:

1. Load the audit rules

2. Listen and forward audit logs when some specific plugins are enabled:

   * https://manpages.debian.org/testing/audispd-plugins/audisp-remote.8.en.html
   * https://man7.org/linux/man-pages/man8/audispd-zos-remote.8.html

There are other plugins that process events in realtime. But, besides the plugin reason, some people may want to search and do reporting. Only the audit logs are supported. And as of a couple weeks ago, I no longer work in the RH Security Dept and have new daytime responsibilities. So, if handling journald logs is needed, someone will have to send patches. Or work to make the journald logs identical. This whole proposal is to reduce my need to ever touch audit code in the future.

According to https://manpages.debian.org/testing/audispd-plugins/audisp-remote.conf.5.en.html, the supprted transports are clear text TCP connections and KRB5 encrypted ones.

I would expect that most users would want their audit logs centralized in the system journal and would then forward the entire journal to a remote system for security / compliance.

Splitting 1 and making it the default would make sense, while keeping 2 as an option for users with more complex cases.

Drop functions from libaudit python bindings that have anything to do with placing and removing rules in the kernel. I'd like the API to just contain what's needed to send audit events and query kernel status. This new binding would be hand written, thus possibly breaking compatibility with the swig generated bindinsg. Not 100% sure on that, but it might be a side effect. The main idea is limit the scope to reduce maintenance and future-proof kernel/ swig changes.

Do we have known users of this API? If there are no known users, then let's focus on what's better from a maintenance and coherence perspective.

Yes, semanage and setroubleshoot-server. Semanage is OK since it only sends events. The other app is problematic as it is using lookup tables directly rather than using auparse which naturally uses the lookup tables. I will be emailing it's maintainer to see what we can do.

If we have known existing users, then maybe that would help guide this discussion.

There might be others not in Fedora. Hopefully they get found in alpha or beta releases.

Moratorium on new arches being supported. If someone else comes along and really shows sustained support for the audit project for a while and they want a new arch to be supported, I might consider it. Since my work on this project is now a hobby, I am not inclined to make more work for my weekends.

I would suggest focusing on architectures supported by Fedora and leaving all the others to community contributions.

Community contributions are usually toss it over the wall expecting me to maintain it. (The DEC Alpha was a good example of this.) I simply have no more time to work on audit since it's not my day job anymore. I don't want any more arches until someone reliable contributes to the project on a sustained basis.

There is one last issue, the stop libexec script has the ability to wait until auditd exits. This is necessary or a restart can happen before the old one can exit. This was a bz filed by RH QE personnel. I am considering adding the logic to auditctl to do this. It already has the pid of auditd. I think watching the inode of /proc/<auditd pid> for deletion would probably do it.

+1 to that. If we can also use the new PID file API (maybe with a fallback to the legacy one) that would be best.

It should be known that journald listens on a best effort, multicast socket that is lossy.

Hum, the journal listens on a netlink socket. I did not know that this was lossy (it's indeed written in https://man7.org/linux/man-pages/man7/netlink.7.html). How does auditd do it lossless then?

Auditd adds extra information. linux-audit/audit-documentation/wiki/SPEC-Audit-Event-Enrichment. This is necessary due to uids being transient and different on each system.

OK, then my mental model of the architecture is not correct and I'm not sure how all of this fits. How does audit "enhances" the logs if the journal already has them. ~~Is there an API in journald to edit/overwrite log entries?~~ Those enhanced logs are only written to the log files?

There are other plugins that process events in realtime. But, besides the plugin reason, some people may want to search and do reporting. Only the audit logs are supported. And as of a couple weeks ago, I no longer work in the RH Security Dept and have new daytime responsibilities. So, if handling journald logs is needed, someone will have to send patches. Or work to make the journald logs identical. This whole proposal is to reduce my need to ever touch audit code in the future.

If I understand correctly, you're suggesting that auditd could read the logs from the journal and then process / enhance them (needs the journal to be reliable and read all logs) and write them to the audit log files. Or we fix journald to process the logs like audit does. Both sound like a good amount of work.

Do we have known users of this API? If there are no known users, then let's focus on what's better from a maintenance and coherence perspective.

Yes, semanage and setroubleshoot-server. Semanage is OK since it only sends events. The other app is problematic as it is using lookup tables directly rather than using auparse which naturally uses the lookup tables. I will be emailing it's maintainer to see what we can do.

OK, we can not really "freely" break those two as they are kind of important indeed.

I would suggest focusing on architectures supported by Fedora and leaving all the others to community contributions.

Community contributions are usually toss it over the wall expecting me to maintain it. (The DEC Alpha was a good example of this.) I simply have no more time to work on audit since it's not my day job anymore. I don't want any more arches until someone reliable contributes to the project on a sustained basis.

Agree. Can we make the support for a specific architecture not impact the others?

If we can also use the new PID file API (maybe with a fallback to the legacy one) that would be best.

What is this new PID file API?

I think everything else above boils down to how the audit netlink comms work. Auditd uses the primary audit netlink socket. It has a backlog where events queue until auditd can get it. If the queue fills, the kernel can take special admin defined actions. Auditd sends an ACK to the kernel that it received the event. Then the kernel dequeues it. Auditd then enhances the event (by resolving local lookups) and distributes it to plugins and writes it to disk. It can run as a logger or a distributor or both. It does not read events from journald.

Journald, on the other hand, reads from a multicast netlink socket. It does not use ACKS, the kernel just blasts them out and everyone has to keep up. At the time it was created, the thinking was that if journald wants events, there might be other apps that also want audit data. So, it was made into a multicast socket. Any app attaching causes an audit event so that we know who all is listening.

Traditionally, other apps get audit events by being an auditd plugin. For example, setroubleshoot has something called sedispatch that is an auditd plugin and relays events to setroubleshootd. But for whatever reason, journald wanted to do it's own thing without auditd so the multicast socket is what it uses.

As for the log format, what I was suggesting is that if someone wanted to use the ausearch, aureport, or the auparse library with journald logs when auditd is not installed, then work would need to be done to harmonize the format and parsing. (See https://github.com/linux-audit/audit-userspace/issues/130) Journald would also need to resolve local lookups if it were forwarding.

But what we looked at for Common Criteria was the opposite: block the journald audit socket so we don't get duplicate events and then have auditd enhancing events and wrapping them around to syslog (journald) for transport. We also have https://github.com/linux-audit/audit-userspace/issues/49 which is to revive the audit log facility called out in RFC 3164. Glibc just needs the define added to syslog.h.

What is this new PID file API?

See https://man7.org/linux/man-pages/man2/pidfd_open.2.html. This lets a process wait for another to terminate in a "race free" fashion:

Figure out the PID of the audit daemon, ideally via systemd
Get a pidfs for that PID
Verify that the PID is still valid and did not change between 1 and 2 (not fully sure how to do that)
Wait for the process to terminate using the pidfs

But what we looked at for Common Criteria was the opposite: block the journald audit socket so we don't get duplicate events and then have auditd enhancing events and wrapping them around to syslog (journald) for transport.

This sounds like the least amount of work from all the options.

Thanks for the info on pidfd_open. Looks like we can use it on current kernels. Older ones won't have it, but audit-4.0 is aimed at recent kernels. I think I can use this to make auditctl wait until auditd exits so that a restart can work without versions of auditd stepping on each other. (Auditd sometimes doesn't terminate immediately on SIGTERM. It stops receiving new events. But it has to clear it's internal queues before exiting.)

I rewrote the signal sending for auditctl in commit de080ad. Looks like the PIDFD_GET_PROCFD ioctl didn't make it into the kernel. So, there is still a possibility of killing the wrong process. But this let me drop the dependency on procps-ng and simplify logic around stopping the daemon.

I think this finishes the work for Audit-4.0alpha. Looking to do a release soon. If anyone knows of any issues, now would be a good time to raise them.

To get this out the door, I enabled access to the 5 functions that setroubleshoot uses in commit 2722984. I think this finalizes audit-4.0.

linux-audit / audit-userspace

Feedback for "[RFC] Future audit changes" #323