rfxn / advanced-policy-firewall

Advanced Policy Firewall (APF)
GNU General Public License v2.0
93 stars 46 forks source link

expirebans function inadvertently clearing mutex lock file #16

Closed AlexisEvo closed 7 years ago

AlexisEvo commented 7 years ago

The expirebans shell function is calling apf recursively whenever a block is removed:

https://github.com/rfxn/advanced-policy-firewall/blob/b99bfbfda22dc3f7f70c550d5da63491c19121ce/files/internals/functions.apf#L1045

This recursive call ends up hitting this line in its execution path:

https://github.com/rfxn/advanced-policy-firewall/blob/b99bfbfda22dc3f7f70c550d5da63491c19121ce/files/apf#L215

Which removes the lock file even though the parent process is still executing. This is problematic as if the random delay for anacron's daily apf reload ends up syncing with the 10 minute apf refresh cron, expirebans will clear the lock file in one process resulting in both crons running simultaneously.

This causes odd connectivity issues which can render the server inaccessible/offline.

AlexisEvo commented 7 years ago

The above commit is briefly tested, will open a pull request after more review.

Only the process that opens the mutex should be able to clear it -- this maintains that by setting APF_MUTEX_LOCKED to "1" when it successfully locks. On unlock, it only deletes the lock file if APF_MUTEX_LOCKED is "1". The code for removing stale locks remains unchanged.

AlexisEvo commented 7 years ago

I spent some more time thinking about this tonight, and I didn't like the original approach APF took towards locks. It would drop the commands entirely rather than queuing them if it encountered a non-stale lock file. This means if a high priority reload (daily reload to ensure any residual firewall issues are fixed & to update glob files) is blocked by a lower priority refresh, the higher priority task would not run. I think this should rather take the approach of yum, where if yum is already locked, any further yum tasks are stalled until the lock is free and then executed.

Mutex's typically stall until it can either enter critical section or it reaches a timeout. This commit brings APF's mutex more in line with a traditional mutex, to help eliminate the potential affects of unintended concurrency. APF now has two timeouts for lock files -- one for entering critical section, and one for removing stale lock files. These default to 60 and 300 seconds respectively.

I've tested this decently well and it is deployed on my CentOS 6 & 7 boxes. Feedback appreciated!