mobilecoinofficial / forest

Enables a Forest of MobileCoin enabled SignalBots. Beta software, APIs may change!
MIT License
19 stars 12 forks source link

Forest start_process fails to restart auxin on Auxin crash #89

Closed iamalwaysuncomfortable closed 2 years ago

iamalwaysuncomfortable commented 2 years ago

Sometimes auxin crashes crash forest bot. The following Auxin in 0.1.8 crash ended up crashing forest core. #Auxin 0.

Guess is that when auxin fails, the following assert causes start_process to fail https://github.com/mobilecoinofficial/forest/blob/4f8bc65d0c1f49634579ef77403f282a373609f0/forest/core.py#L138

And then because the task is stored within the class dict, no error is sent because the task is never garbage collected. https://github.com/mobilecoinofficial/forest/blob/4f8bc65d0c1f49634579ef77403f282a373609f0/forest/core.py#L534-L536

We should

  1. Not use assert for stdin monitoring (asserts are for values always expected to be true, shouldn't be used on values likely to change), we should explicitly check its status and if we detect stderr, restart auxin
  2. Monitor the start_process() task and if it is in error state, dereference and start a new task.
iamalwaysuncomfortable commented 2 years ago

https://github.com/mobilecoinofficial/forest/pull/99 <- This pull is the proposed fix

iamalwaysuncomfortable commented 2 years ago

This issue has been mostly solved by: