Closed hackartisan closed 1 year ago
I'm seeing similar behavior with CDH staging sites that are IP restricted, and suspect it may also be happening with errors being served to google and bing search bots (google search console is reporting a suspiciously large number of "soft 404s" for the CDH website, meaning content is not found but it was not served with a 404 or other status).
You can easily duplicate the problem by using curl on an IP restricted site while not on VPN, e.g.:
curl -I https://test-geniza.cdh.princeton.edu
HTTP/2 200 server: nginx/1.19.10 date: Mon, 07 Jun 2021 16:43:00 GMT content-type: text/html content-length: 2489 last-modified: Fri, 04 Jun 2021 20:39:29 GMT etag: "60ba8f81-9b9" accept-ranges: bytes
I see that both prod and staging maintenance.conf @maintenance
locations include a =502
directive.
Based on this stackoverflow answer I think maybe it is returning 200 because the try_files
succeeds on finding index.html
, although that's not clear to me from the nginx documentation for try_files
Possibly related to this change: I'd like to see a cache-control header that sets a shorter life for this page. Right now my browser caches it and I have to do a hard-refresh to get the real content instead of the maintenance page. (Currently it has last-modified and etag headers which are probably based on the html file, but that aren't really relevant for an error/maintenance page.)
Most often this problem happens when I try to access something that requires vpn before I'm actually on vpn: try to access the site, get the maintenance page, login to vpn, reload the page — still maintenance page; hard refresh — see the actual content.
Nginxplus should pass through the error status code provided by the upstream server instead of a generic 200.
We ran into this challenge again with the recent outage of the approvals site. If we can't pass the underlying error through, can we at least pass something that's not a 200? Even a 500 error would be an improvement.
Also related to #4073 because we'd like to return a custom page for anyone who gets rate-limited. And so also related to https://github.com/pulibrary/orangelight/issues/3638.
An expert from F5 showed me how to configure this correctly.
We just needed to delete the =500
, including the =
, and then any site config that uses errors.conf
(instead of <env>-maintenance.conf
) and has proxy_intercept_errors on;
in its location
definition will pass any error from the upstream server to the client.
I know @rlskoeser doesn't want to use this functionality on all sites, or possibly for all error codes. And I plan to create an option for this in the ansible template for nginx site configs. So there's more work to do to make this flexible. But now we understand how to do it!
Congratulations again @acozine for solving this! 🌮 🎉 👏
Documenting here what I mentioned on Slack:
I don't want the generic message to override anything we have customized at the application level; for us that's usually 404 and 500, but I think 401 and 403 may have django-specific behaviors we'd want to preserve, depending on the application. I'd be glad to coordinate with you at some point to test integrating this new setup with one of my test sites so we can figure out a good configuration recipe that will work across CDH sites.
Closed by #4121, #4122, and #4140.
Remaining related work detailed in https://github.com/Princeton-CDH/cdh-ansible/issues/60 and in #4142.
During an outage of many of our machines, nginx appears to have returned the stock error page with a 200 status. The underlying error appears to be:
This should return as a non-200. If it had returned a non-200 figgy would have appropriately interpreted it as an error (see https://github.com/pulibrary/figgy/blob/186fadca59e3dfea86068d53cfbb669334373d57/app/services/pul_metadata_services/client.rb#L26) However, it tried to parse it as JSON. Relevant honeybadger issue: https://app.honeybadger.io/projects/53391/faults/79044416
some datadog logs: https://app.datadoghq.com/logs?context_event=AQAAAXkOtKOwCb6NGQAAAABBWGtPdEt5TUFBQVBYR09RaVVYbWJnQVI&event=AQAAAXkOtJ_ICb6NDQAAAABBWGtPdEt5TUFBQVBYR09RaVVYbWJnQUY&from_ts=1618931463021&index=&live=false&messageDisplay=inline&query=host%3Alib-adc1%20service%3Anginxplus%20json&stream_sort=desc&to_ts=1619449393000&viz=