Open aconite33 opened 6 years ago
Adding @stevegrubb as I suspect he might have a quick answer for this.
I hoped that made sense. Pretty intimating talking to Linux's guru who attend Linux symposiums :)
@aconite33 I assure you we are probably more alike then we are different. Perhaps the only noteworthy difference is that I break the kernel a bit more often ;)
Yes, your proposal does make sense - and you get serious style point for the formatting! - I'm just not sure this is something we want to solve in the kernel. My understanding of the problem is that it is due to segmenting of the audit logs, yes? Rather than add some additional overhead to the kernel audit processing, I would rather see this resolved in userspace via log aggregation or some other fancy log analysis tools.
@stevegrubb currently spends the most time working with the userspace, I'm hopeful he will be able to provide some more concrete examples.
Roger. I wasn't sure to post in Kernel or Userspace. Basically, I just want to have some unique logging mechanisms for process creation. It works very well on Windows/AIX when you can without a doubt trace each process "chain" if you will.
From an DFIR perceptive, it makes it very easy to trace what the root cause of something is, when you can work those artifacts in this nature.
I'm just glad my idea didn't get shot down after the first comment! :)
On 2018-02-27 14:37, Paul Moore wrote:
Yes, your proposal does make sense - and you get serious style point for the formatting! - I'm just not sure this is something we want to solve in the kernel. My understanding of the problem is that it is due to segmenting of the audit logs, yes? Rather than add some additional overhead to the kernel audit processing, I would rather see this resolved in userspace via log aggregation or some other fancy log analysis tools.
I also agree the proposal makes sense, but I think a userspace solution depends on guaranteed reporting of task creation and destruction and tracking pid_t lifetimes.
Perhaps I'm missing something, I don't see how log segmentation is the issue here. Long-running processes with many short-lived ones that roll the counter, duplicating pid_t references appear to be causing the issue. I assume userspace is already competent at aggregating logs.
My gut reaction is to lengthen the PID rollover time. The kernel pid_t is already a u32 which I would have thought would have been a good start (who will ever need more than 640kB of RAM? ;-). Lengthening it to a u64 would pretty much eliminate the problem, but would impose a significant performance hit for comparisons for the entire kernel and the longer the system is up it would add to audit record overhead, the latter of which would be less of an issue compared with the kernel-wide comparisons. I wonder what is the design target for PID rollover time, or PID occupancy? Depending on system hw and config, 4k, 32k or 4M pid_t are available (0.5G threads)
We do already have another value associated with tasks that isn't exposed to userspace that could be useful for this. Each task has a pointer to its task_struct which on a 64-bit system is a long long. The full range of that long long isn't used for task_struct pointers, but I could see a combination of the pid_t and hash of the task_struct pointer to be more unique than the pid_t alone. Since the hash of the task_struct pointer is used rather than the pointer itself, it would present less of an information leak about kernel memory structure. Another possibility would be based on task start time.
There has been some movement away from depending solely on pid_t for audit in the kernel for identifying tasks internally, so the problem has already been considered at least internally.
We do already have another value associated with tasks that isn't exposed to userspace that could be useful for this. Each task has a pointer to its task_struct which on a 64-bit system is a long long. The full range of that long long isn't used for task_struct pointers, but I could see a combination of the pid_t and hash of the task_struct pointer to be more unique than the pid_t alone. Since the hash of the task_struct pointer is used rather than the pointer itself, it would present less of an information leak about kernel memory structure. Another possibility would be based on task start time.
I think this makes the most sense to me. I can provide some examples of where I've seen multiple PID roll overs within less than 72 hours. So when you're trying to programmatically search for unique process chain paths, you come up with multiple results, which leave the user to do intelligent "guesswork" to determining the "true" path of a process chain.
By having a UUID/Hash of the Parent & Child processes and including it in the 59 (EXECVE) syscall would solve this issue, as then I can track each process from it's parent/child relationship. I don't think you would need auditd to keep track of the processes over time, just having a hash value that can be calculated every time a new process is spawned would be sufficient. Though I'm not sure of the overhead that would be to calculate it every time a process is launched.
On 2018-02-28 03:23, aconite33 wrote:
By having a UUID/Hash of the Parent & Child processes and including it in the 59 (EXECVE) syscall would solve this issue, as then I can track each process from it's parent/child relationship. I don't think you would need auditd to keep track of the processes over time, just having a hash value that can be calculated every time a new process is spawned would be sufficient. Though I'm not sure of the overhead that would be to calculate it every time a process is launched.
Since it doesn't have to be cryptographically strong, it need not be a computationally expensive hash function. Such values could be exposed in an auxiliary audit record that can be filtered or ignored for those who don't need it, or even use the feature enable/disable switch to shut it off.
That would be perfect. Just have something like a SHA1. Also, having a switch to turn it on would be excellent, since it's not applicable to everyone if they don't need/want it.
Whatever happened to this? From a threat hunting perspective, having a unique process ID similar to Windows' ProcessGUID would be immensely helpful.
I'm not aware of any work in this area.
I don't believe any movement has been made. However, with the potential release of Sysmon for Linux, that may be a better alternative, as it will have UUID for each process. I don't know the full coverage of Sysmon for Linux, but it may be comparable to auditd. (Or at least, complementary)
https://github.com/sysflow-telemetry also appears to handle this, I think. See https://sysflow.readthedocs.io/en/latest/spec.html#object-id
Though I still think something native to audit would be nice.
Universally Unique Identifier for PIDS and child PIDS
Hello, hopefully I can dictate my issue I'm running into with Auditd. If there is a solution to this that I'm not aware of, I'd be very interested in it.
Problem Statement:
Inability to uniquely trace a process's parent/child relationship.
Compared to other logging mechanisms on Windows/AIX you can walk the chain uniquely on a mountain of log data.
TL;DR
Add a parent UUID to audit message logs (Potentially audit ID) in the process creation (e.g., EXECVE, 59) so that a user can uniquely track the parent/child relationship of process creating/execution.
Example Walkthrough:
Using a combination of auditd, syslog, rsyslog, and Elastic Stack, admins can log a variety of information. One example is tracing the "chain" of a parent/children relationship of a process. If an admin wants to trace a process's life from the last child to the first parent, you can trace it through PID/PPID relationship. This can be done using auditd with rules like this:
This will record every SYSCALL for a creation of a new process (EXECVE, 59). Sample output:
Tracing this can be done using the PID/PPID values to look for its parent. In this simple case, you would be looking for a process created with PID=3380 with syscall 59 (EXECVE).
Which will pull information regarding the creation of this particular process and its parent:
Problem
When there are multiple logs to dig through, the PID/PPID could have been cycled over the uptime of the server/system. E.g., a system with uptime over a week, trying to track down the parent process of a process may overlap with other PIDs. SSH daemon process is a good example of a long running process that may cycle it's PID's when trying to see all what a user did when they logged in to the system.
If a user is nefarious and tries compromise the system, tracking the user from exploit process to originating process can become cumbersome and guess work at times.
Solution?
Creating a UUID that can be associated with both the child and the parent can allow admins/DFIR to look into following the chain of events uniquely through the different process chains to discover the originating "entry" point of the process tree. (If enough log data, tracing it back to the root process).
Comparison
Windows Windows OS, when using sysmon, can uniquely trace each process creation from the parent to the child. They use this by having a ProcessGUID associated with the Process Create event.
AIX AIX does this with a PROC_CREATE message that is generated when a new process is launched.
I think it would be useful to have this type of logging mechanism so Linux users can benefit from tracking the chain of processes uniquely, instead of what I've experienced as guesswork at times to follow a process chain to its originating source.