Closed rioug closed 5 days ago
The configuration seems to be effective on uk_prod
, the problem has not occurred since the configuration change. I also observed this in the logs:
[43bd9a87-443d-46ff-82e3-b9e8bc749552] Rack::Timeout::RequestTimeoutException (Request ran for longer than 120000ms , 2/3 timeouts allowed before SIGTERM for process 5357)
The error showed up on fr_prod
again on the 17th of August : https://app.bugsnag.com/open-food-france/coopcircuits/errors/66c04a487fce022d1c502938?event_id=66c04a4800f8769690320000&i=sk&m=nw
The error showed up on fr_prod
on the 1st of September : https://app.bugsnag.com/open-food-france/coopcircuits/errors/66d4a99b0ab81c7871222ec5?event_id=66d4a99b00f88fc102370000&i=sk&m=nw
The fix for fr_prod
had not been deployed, this now done.
From the fr_prod
log :
Rack::Timeout::RequestTimeoutException (Request waited 2ms, then ran for longer than 119998ms , 1/3 timeouts allowed before SIGTERM for process 3585187)
It looks like it's working as intended, we have not seen the problem since the configuration change.
Description
uk_prod is sometimes getting database connection error, indicating the connection pool is empty : bugnsag This prevent the website from loading.
This is most likely due to a high number of Rack Timeout errors : https://app.bugsnag.com/yaycode/openfoodnetwork-uk/errors/664378ec33e1080008ba685a?filters[error.status]=open&filters[event.since]=30d It's not known why these timeout are happening, one hypothesis is the server was busier than usual, maybe because a higher than usual number of report were ran at the time. Report shouldn't affect the database connection pool, as they are generated in the background and Sidekiq has it's own connection pool, but the can they can add significant load to the server, making other request slow.
This article https://medium.com/@mendespedro77/solving-activerecord-connection-pool-errors-in-rails-applications-b7a5861573b9 provides various suggestion we can follow to get to the bottom of this.
For now we applied a configuration change for
rack-timeout
that should mitigate the issue : https://github.com/openfoodfoundation/ofn-install/pull/932 and we made a little change in our logging which would give us more information if the issue crops up again : https://github.com/openfoodfoundation/openfoodnetwork/pull/12715This error has also been seen on fr prod on the 5th of August 2024 : https://app.bugsnag.com/open-food-france/coopcircuits/errors/66afb3f75e9c074bc47a6cb2?event_id=66afb3f700f69a1692450000&i=sk&m=nw
Expected Behavior
The website loads without error
Actual Behaviour
The website doesn't load and return a 500 error
Steps to Reproduce
Animated Gif/Screenshot
Workaround
Restart the server/puma or apply config change https://github.com/openfoodfoundation/ofn-install/pull/932
Severity
bug-s3: a feature is broken but there is a workaround
Your Environment
Possible Fix