target / goalert

Open source on-call scheduling, automated escalations, and notifications so you never miss a critical alert
https://goalert.me
Apache License 2.0
2.17k stars 230 forks source link

Startup Hangs on Early Errors in GoAlert and Prevents Troubleshooting #3868

Open mastercactapus opened 1 month ago

mastercactapus commented 1 month ago

Describe the Bug: GoAlert hangs if there's an error early during the startup process. This problematic condition makes it difficult to troubleshoot, as the process hangs indefinitely and doesn't print the actual error for admin to investigate. Moreover, even though it is hanged, the bound address allows health checks to connect. However, those health checks will hang too as the HTTP handler isn't fully registered due to incomplete startup.

Steps to Reproduce:

  1. Go to 'app/runapp.go'.

  2. Add an early return with an error message as follows:

        eventCtx, cancel := context.WithCancel(ctx)
        defer cancel()
        eventDoneCh, err := app.listenEvents(eventCtx)
        return fmt.Errorf("test")
        if err != nil {
            return err
        }
  3. Start the GoAlert system.

  4. Observe that the logs show config loaded and hang indefinitely without printing the Listening message.

Expected Behavior: If there's an error during the startup process, the system shouldn't hang and should report the error accurately to enable efficient troubleshooting. Furthermore, health checks should either fail or complete rather than hanging indefinitely.

Observed Behavior: When there's an error during the startup process, the system hangs, doesn't print any error, and makes health checks hang indefinitely, making it difficult for an admin to troubleshoot the issue.

Application Version: This issue is observed in the current master version of GoAlert.

Additional Context: The start-up issue specifically happens when there's an early return error during the startup process.