synadia-io / nex

The NATS execution engine
https://docs.nats.io/using-nats/nex
Apache License 2.0
194 stars 15 forks source link

SPIKE: explore a supervision tree with Ergo #217

Open autodidaddict opened 5 months ago

autodidaddict commented 5 months ago

Proposed change

This would replace the large number of go routines in the codebase that loop on a select with formalized process supervision trees and hierarchies. Take a look at the effort involved in migrating our ad-hoc "select loop" goroutine codebase into one that utilizes GenServer processes and supervision hierarchies provided by ergo

This would also need to evaluate the risks and tradeoffs. If we decided to use this approach, we would need to fully document an Architectural Decision Record (ADR) that outlines our reasoning, at decision time.

"Let's not do this" is a perfectly acceptable resolution, as is "let's do this now". Any resolution needs facts and assertions to back it up.

Use case

While it's fairly idiomatic Go to go workPuller() and then inside workPuller() you have a loop that begins with a select, it's not easy to read and it's pretty error prone. When everything works well, this is fine, but what happens if one of these go routines panics, or there's a different kind of error. What happens if it doesn't panic, but the routine fails or is canceled and needs to restart with known good state?

If you try and build all of that into each and every one of the goroutine select loops, it becomes obvious that we need a framework or a library to declare our supervision trees of processes.

Contribution

No response

autodidaddict commented 5 months ago

Note that the creator of Ergo is actively working on a rewrite that cleans things up and makes it more Go-idiomatic if we don't need or want the Erlang-specific features. Looking forward to trying it out.