Problematic use of threads and fork() in main trust router process

At present, processes to handle TID requests and monitoring requests are created via fork(). TRP handling is done in the main process using pthreads.

According to standards (originally from POSIX),

A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called.

We are definitely violating this requirement.

A result of this is that our TID and monitoring requests do not exit properly. The root cause of that seems to be related to threading in the log4shib package, which is pulled in through our use of mech_eap. This consistently ends up waiting for a mutex during its cleanup code, leaving around sleeping processes. This causes issue #84 (although a workaround exists for that). Similar behavior has been seen in Apache.

Ways I can think of to fix this and bring our architecture into compliance:

Move multithreaded TRP handling into its own subprocess. Introduce pipe-based messaging to keep the TRP and main process states synchronized.
Handle TID and monitoring requests via separate binaries launched via exec().
Eliminate fork() entirely and refactor TID and monitoring handlers to operate in threads instead of subprocesses.

painless-security / trust-router

Problematic use of threads and fork() in main trust router process #85