The Bilara server has long been plagued with intermittent downtime.
Multiple approaches can be used together to improve reliability:
Use libgit2 (e.g. pygit2) for as many git interactions as possible. The git command is designed to be used by humans in particular ways, and it can be ungraceful when used by machines. libgit2 is designed to be used by machines including on Github and is far better stress-tested, it is also about two orders of magnitude faster which will help to reduce resource contention.
Split off some of the code that does Github synchronization (Webhook and general push/pull) into a micro service that runs in a different process and can be restarted separately. There is no need for these tasks to block the general business of the Bilara server.
Implement health checks and automatic restarting, though the code not failing in the first place is more preferable to relying on healthchecks.
The Bilara server has long been plagued with intermittent downtime.
Multiple approaches can be used together to improve reliability: