This project works by having a leader process with several worker processes, and communication between them. Each worker process has quite a bit of data in memory. Currently, when a request to do an admin lookup comes in, the leader process will send a message to all relevant workers and wait for a response.
If any worker process has died for any reason (such as running out of memory, being stopped manually, etc), the leader process will wait for the response forever and all progress stops.
There are lots of ways this could be solved. Some ideas:
Detect when any worker has stopped and shut down all other workers and the leader process. This would rely on service management tools to restart the entire admin lookup setup.
Attempt to restart just that worker and continue on
Return an error upstream so that it can be handled in another way
This project works by having a leader process with several worker processes, and communication between them. Each worker process has quite a bit of data in memory. Currently, when a request to do an admin lookup comes in, the leader process will send a message to all relevant workers and wait for a response.
If any worker process has died for any reason (such as running out of memory, being stopped manually, etc), the leader process will wait for the response forever and all progress stops.
There are lots of ways this could be solved. Some ideas:
This ticket was originally one task in a list at https://github.com/pelias/wof-admin-lookup/issues/117