linux-application-whitelisting / fapolicyd

File Access Policy Daemon
GNU General Public License v3.0
199 stars 56 forks source link

Use the non-blocking event type in permissive mode and fix case of leaking file descriptors #192

Closed stevenbrz closed 2 years ago

stevenbrz commented 2 years ago

fanotify supports FAN_OPEN and FAN_OPEN_EXEC which do not block on receiving a decision. For our use case of having fapolicyd produce logs which we then can analyze asynchronously, this can boost performance significantly.

In addition when testing, we noticed that if the internal event queue fills, we do not close the event's file descriptor, which results in the process accumulating them over time if the queue size is not properly configured.

(co-authored by @kenbreeman)

kenbreeman commented 2 years ago

Part of the motivation for this change is that it can be difficult to deploy something like this in a large scale environment when one of the failure modes can completely hang the system. Having a permissive mode available that's lower risk makes this easier to adopt, debug, and deploy.

stevegrubb commented 2 years ago

A long tine ago, it operated like you are suggesting. The problem is that when it gets to assessing the rules, the application can be gone. It was causing errors and spurious failures. So, to get accurate results, it now unconditionally approves the request after the rules engine has run.

There are plans for more performance work to make it run faster. (The latest release should be a little better.) Also, if you are using sha256 integrity, switch to size - just for collecting logs. And, if you can use file-libs-5.42, it is significantly faster due to improvements in regex handling.

stevenbrz commented 2 years ago

Makes sense - going to close this out in favor of a PR just containing the bugfix: https://github.com/linux-application-whitelisting/fapolicyd/pull/193.

kenbreeman commented 2 years ago

I'm curious what kind of errors can happen in the non-blocking mode? fanotify returns an open file handle, and this PR doesn't close the file handle until after assessing the rules, so I'm curious what you mean by 'can be gone'.

Context: we run some large mesos instances and have been struggling to get fapolicyd running without affecting system stability (even with integrity=non, a single allow all rule, and tuned cache/queue sizes), this patch is running stable for us and we haven't encountered any errors yet.

stevegrubb commented 2 years ago

Everything on the subject side of a rule, except the PID, comes from opening a couple files in /proc/pid That is where the bulk of the problems comes.