Open mastercactapus opened 6 months ago
This bug occurs due to an error handling issue during GoAlert's startup, where any early error causes the application to hang instead of cleanly exiting and providing a clear error message. This behavior not only fails to log the error but also leaves the health checks in a hung state, which complicates troubleshooting for administrators.
Cause of Hanging:
Health Check Issue:
Logging Gap:
app/runapp.go
, simulate an early error by adding an immediate return after event listener initialization.
eventCtx, cancel := context.WithCancel(ctx)
defer cancel()
eventDoneCh, err := app.listenEvents(eventCtx)
return fmt.Errorf("test") // Simulate an early error
if err != nil {
return err
}
To resolve this, we need to improve error handling during the startup phase by ensuring that:
defer
statements to handle context cancellation and any other necessary cleanup. This helps to avoid any leftover goroutines or services in an active state, which can cause hanging.eventCtx, cancel := context.WithCancel(ctx)
defer cancel()
// Attempt to set up event listeners
eventDoneCh, err := app.listenEvents(eventCtx)
if err != nil {
log.Errorf("Startup error: %v", err)
return err // Clean exit with logged error
}
// Continue startup only after successful listener setup
// Bind network address here, after confirming no errors
Describe the Bug: GoAlert hangs if there's an error early during the startup process. This problematic condition makes it difficult to troubleshoot, as the process hangs indefinitely and doesn't print the actual error for admin to investigate. Moreover, even though it is hanged, the bound address allows health checks to connect. However, those health checks will hang too as the HTTP handler isn't fully registered due to incomplete startup.
Steps to Reproduce:
Go to 'app/runapp.go'.
Add an early return with an error message as follows:
Start the GoAlert system.
Observe that the logs show config loaded and hang indefinitely without printing the Listening message.
Expected Behavior: If there's an error during the startup process, the system shouldn't hang and should report the error accurately to enable efficient troubleshooting. Furthermore, health checks should either fail or complete rather than hanging indefinitely.
Observed Behavior: When there's an error during the startup process, the system hangs, doesn't print any error, and makes health checks hang indefinitely, making it difficult for an admin to troubleshoot the issue.
Application Version: This issue is observed in the current master version of GoAlert.
Additional Context: The start-up issue specifically happens when there's an early return error during the startup process.