Open kayiwa opened 2 months ago
The check is currently green, although it does regularly go into Critical/red mode overnight. It's checking lae.princeton.edu for the text Digital Archive of Latin America and Caribbean Ephemera
, which is definitely present on the site home page. Maybe the machine reboots? Or the site is unresponsive for some reason?
I checked uptime on lae-prod1
and lae-prod2
, both have been up for 8 days, so the problem is not that the servers are rebooting overnight.
We saw multiple alerts and recoveries on this check over the weekend. Both VMs have plenty of space. In the rails logs I see a couple of entries like this:
W, [2024-09-30T00:28:36.394569 #142462] WARN -- honeybadger: ** [Honeybadger] Error report failed: an unknown error occurred. code=error error="HTTP Error: Net::OpenT
imeout" level=2 pid=142462
and a lot of entries like this:
E, [2024-09-30T00:28:38.832051 #142483] ERROR -- : [dd.env=production dd.service=dpul dd.trace_id=85276041913562946 dd.span_id=1297836789973117643 ddsource=ruby] Health check failed with: execution expired
We have an alert that is almost certainly misconfigured to check from content on an endpoint.