Open kevinreiss opened 4 months ago
This also seems to be happening in approvals:
https://app.honeybadger.io/projects/67941/faults/105117183 https://app.honeybadger.io/projects/67941/faults/105650358
Also in the catalog email feedback options: https://app.honeybadger.io/projects/54399/faults/105717431
There are multiple permutations of this issue that can be seen in honeybadger in recent weeks. All of the errors starting with Net::ReadTimeout seem to be a variation of the same thing. The system reaches a point where it is trying to send mail and the action logs and error. Looking at the pattern they have occurred occasionally over a long period of time, but switched to happening multiple times per day around March 7th.
One approach: Check for the Net::ReadTimeout error and when it happens , retry sending the email.
John K. Provided this sample from the 3/18 mail log
[root@lib-ponyexpr systems]# tail -4000 /var/log/maillog | grep "did not issue MAIL/EXPN/VRFY/ETRN"
Mar 18 00:01:49 lib-ponyexpr sendmail[10262]: 42I41nDm010262: catalog1.princeton.edu [128.112.201.161] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 08:44:56 lib-ponyexpr sendmail[12613]: 42ICiuXG012613: catalog1.princeton.edu [128.112.201.161] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 09:37:07 lib-ponyexpr sendmail[12872]: 42IDb7uc012872: catalog3.princeton.edu [128.112.200.180] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 11:10:12 lib-ponyexpr sendmail[13451]: 42IFACWp013451: catalog3.princeton.edu [128.112.200.180] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 11:10:38 lib-ponyexpr sendmail[13455]: 42IFAc00013455: catalog3.princeton.edu [128.112.200.180] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 11:13:36 lib-ponyexpr sendmail[13472]: 42IFDaAq013472: catalog3.princeton.edu [128.112.200.180] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 11:21:13 lib-ponyexpr sendmail[13520]: 42IFLDEd013520: catalog1.princeton.edu [128.112.201.161] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 11:44:04 lib-ponyexpr sendmail[13710]: 42IFi4JL013710: catalog1.princeton.edu [128.112.201.161] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 12:40:09 lib-ponyexpr sendmail[14169]: 42IGe9qQ014169: catalog3.princeton.edu [128.112.200.180] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 13:41:03 lib-ponyexpr sendmail[14677]: 42IHf3Q9014677: catalog2.princeton.edu [128.112.201.229] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 15:12:14 lib-ponyexpr sendmail[15264]: 42IJCEXW015264: catalog3.princeton.edu [128.112.200.180] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 15:33:42 lib-ponyexpr sendmail[15424]: 42IJXg6e015424: catalog2.princeton.edu [128.112.201.229] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 15:48:30 lib-ponyexpr sendmail[15516]: 42IJmUEZ015516: catalog3.princeton.edu [128.112.200.180] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 16:38:31 lib-ponyexpr sendmail[15925]: 42IKcVZ1015925: catalog2.princeton.edu [128.112.201.229] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 16:45:06 lib-ponyexpr sendmail[15955]: 42IKj6Pl015955: catalog2.princeton.edu [128.112.201.229] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 16:52:55 lib-ponyexpr sendmail[16001]: 42IKqtCd016001: catalog1.princeton.edu [128.112.201.161] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 17:21:54 lib-ponyexpr sendmail[16142]: 42ILLsox016142: catalog1.princeton.edu [128.112.201.161] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 17:30:38 lib-ponyexpr sendmail[16178]: 42ILUcni016178: catalog3.princeton.edu [128.112.200.180] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 17:47:17 lib-ponyexpr sendmail[16306]: 42ILlH7I016306: catalog2.princeton.edu [128.112.201.229] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 18:16:08 lib-ponyexpr sendmail[16464]: 42IMG8ed016464: catalog2.princeton.edu [128.112.201.229] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 20:09:28 lib-ponyexpr sendmail[16911]: 42J09SkW016911: catalog2.princeton.edu [128.112.201.229] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 20:11:49 lib-ponyexpr sendmail[16931]: 42J0BnBN016931: catalog2.princeton.edu [128.112.201.229] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 20:28:50 lib-ponyexpr sendmail[17004]: 42J0SoOx017004: catalog1.princeton.edu [128.112.201.161] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA
Mar 18 21:21:39 lib-ponyexpr sendmail[17274]: 42J1LdEn017274: catalog1.princeton.edu [128.112.201.161] did not issue MAIL/EXPN/VRFY/ETRN during connection to MTA```
Two possible fixes:
Since this was deployed at approx 4:30 on 3/20 no errors related to mail timeouts recorded. Will see if this holds up today, I may reduce the open_timeout down to 3~5 seconds as it seems to be actually waiting 10 seconds to submit.
In the five days since this update was put in production, we've seen a timeout error once related to email. This has made an impact but it does not completely get rid of the issue.
possibly related to bibdata#2320, both network related issues started about the same time.
Need to catch up on changes DLS has made to their SMTP infrastructure that can potentially make our set-up more robust.
Expected behavior
When a user places a request the system sends out one or more emails notifying staff of the request and confirming with the user that the request was placed.
Actual behavior
User submits a request for an item. The system displays an error message to the user because the send_mail action for one of the notification emails has generated a Timeout error. However, the API calls to place the request actually succeed so the request is successfully accepted by the desired systems (i.e. Alma, ILLiad, SCSB). The displayed failure notice in the UI then leads the user to attempt to place the request again which results in another error displayed because the processing systems think it's a duplicate request.
Steps to replicate
This an intermittent issue that is difficult to reliably replicate. This issue has increased in frequency in the last ten days, it's happening roughly 3-5 times a day.
Impact of this bug
This provides a degraded user experience. When the timeout error happens the UI tells the user the request has encountered an error and they try to request the item again. Since the API calls are actually successfully, the second attempt fails and generates a notification to the recapproblems email queue saying the user attempted to request a barcode that is not available.
Honeybadger link and code snippet, if applicable
See most recent honeybadger errors that fit this pattern, there are three distinct errors that appear all in a send_mail method.
https://app.honeybadger.io/projects/54399/faults/99737318 - Net::ReadTimeout https://app.honeybadger.io/projects/54399/faults/101572128 - Net::OpenTimeout https://app.honeybadger.io/projects/54399/faults/105534722 - Net::ReadTimeout