Open jtgeibel opened 3 years ago
While logging certainly makes sense, I'm wondering if it would be better to use Sentry more for these things 🤔
I think we should do both where possible. My original motivation for investigating was to make sure we capture Heroku platform level error codes, where the request may not make it to the backed, or where the backend completes successfully but for some reason the user still sees an error. Then by adopting the existing prefix, we can ensure that all levels of errors end up in at least one place together.
Here is a summary of
error=""
entries in our logs that we may want to monitor more closely in our metrics. We may want to do like Heroku does and assign code values to these error cases. We should ensure these all have anat=error
prefix so that they can be easily ingested from logs.error="canceling statement due to statement timeout"
error="unhealthy database pool"
error="there is no unique or exclusion constraint matching the ON CONFLICT specification"
error="end of file reached"
(on crate publish endpoint)downloads_counter error: unhealthy database pool
at=error mod=downloads_counter error="unhealthy database pool"
.Error: error sending request for url (https://events.pagerduty.com/generic/2010-04-15/create_event.json): operation timed out
error="failed to upload crate: error sending request for url (https://crates-io.s3-us-west-1.amazonaws.com/crates/xyz/xyz-0.2.0.crate): connection closed before message completed"
error="missing user {private.inviter_id}"
,error="missing crate with id {invitation.crate_id}"
.Additionally, we may want to add an
at=warn
prefix that could be used to flag slow requests and other operationally interesting events that aren't strictly errors.