Closed eecavanna closed 4 weeks ago
At the time the upload fails (i.e. approximately 5 minutes after starting the upload), I see these lines in the caddy
container's log:
exouser-caddy-1 | {"level":"error","ts":1720044809.4940464,"logger":"http.handlers.reverse_proxy","msg":"reading from backend","error":"read tcp 192.168.0.2:60702->192.168.0.4:5000: read: connection reset by peer"}
exouser-caddy-1 | {"level":"error","ts":1720044809.4941583,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","upstream":"app:5000","duration":309.991915694,"request":{"remote_ip":"__REDACTED__","remote_port":"61441","client_ip":"__REDACTED__","proto":"HTTP/2.0","method":"POST","host":"edge-dev.microbiomedata.org","uri":"/auth-api/user/upload/add","headers":{"Priority":["u=1, i"],"Accept-Language":["en-US,en;q=0.9"],"X-Forwarded-Proto":["https"],"Cookie":[],"Accept":["application/json, text/plain, */*"],"Sec-Ch-Ua-Mobile":["?0"],"Content-Length":["1000000531"],"Cache-Control":["no-cache"],"Sec-Ch-Ua":["\"Brave\";v=\"125\", \"Chromium\";v=\"125\", \"Not.A/Brand\";v=\"24\""],"Referer":["https://edge-dev.microbiomedata.org/user/files"],"Sec-Fetch-Mode":["cors"],"Sec-Fetch-Dest":["empty"],"Sec-Gpc":["1"],"Sec-Fetch-Site":["same-origin"],"Pragma":["no-cache"],"X-Forwarded-Host":["edge-dev.microbiomedata.org"],"Authorization":[],"Accept-Encoding":["gzip, deflate, br, zstd"],"X-Forwarded-For":["__REDACTED__"],"User-Agent":["Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"],"Origin":["https://edge-dev.microbiomedata.org"],"Content-Type":["multipart/form-data; boundary=----WebKitFormBoundaryzCs3N3kdNMcOfvdy"],"Sec-Ch-Ua-Platform":["\"macOS\""]},"tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"edge-dev.microbiomedata.org"}},"error":"reading: read tcp 192.168.0.2:60702->192.168.0.4:5000: read: connection reset by peer"}
exouser-caddy-1 | {"level":"error","ts":1720044809.4942892,"logger":"http.log.access","msg":"handled request","request":{"remote_ip":"__REDACTED__","remote_port":"61441","client_ip":"__REDACTED__","proto":"HTTP/2.0","method":"POST","host":"edge-dev.microbiomedata.org","uri":"/auth-api/user/upload/add","headers":{"Content-Length":["1000000531"],"Authorization":[],"Referer":["https://edge-dev.microbiomedata.org/user/files"],"Accept-Language":["en-US,en;q=0.9"],"Sec-Fetch-Mode":["cors"],"Accept-Encoding":["gzip, deflate, br, zstd"],"Accept":["application/json, text/plain, */*"],"User-Agent":["Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"],"Origin":["https://edge-dev.microbiomedata.org"],"Sec-Fetch-Dest":["empty"],"Cookie":[],"Sec-Fetch-Site":["same-origin"],"Pragma":["no-cache"],"Cache-Control":["no-cache"],"Sec-Ch-Ua":["\"Brave\";v=\"125\", \"Chromium\";v=\"125\", \"Not.A/Brand\";v=\"24\""],"Content-Type":["multipart/form-data; boundary=----WebKitFormBoundaryzCs3N3kdNMcOfvdy"],"Sec-Ch-Ua-Mobile":["?0"],"Sec-Ch-Ua-Platform":["\"macOS\""],"Sec-Gpc":["1"],"Priority":["u=1, i"]},"tls":{"resumed":true,"version":772,"cipher_suite":4865,"proto":"h2","server_name":"edge-dev.microbiomedata.org"}},"bytes_read":878797125,"user_id":"","duration":0,"size":0,"status":408,"resp_headers":{"Server":["Caddy"],"Alt-Svc":["h3=\":443\"; ma=2592000"]}}
Highlights (from those lines):
"msg": "aborting with incomplete response","upstream":"app:5000","duration":309.991915694
"Content-Length":["1000000531"]
"error":"reading: read tcp 192.168.0.2:60702->192.168.0.4:5000: read: connection reset by peer"
"bytes_read":878797125
The timestamp (1720044809.4940464
) translates to:
Wednesday, July 3, 2024 10:13:29.494 PM (GMT)
(UTC)Wednesday, July 3, 2024 3:13:29.494 PM GMT-07:00 DST
(PDT)I also see this in the app
container's log, but the timestamps seem to be 1 minute off from the ones above:
exouser-app-1 | 2024-07-03 22:14:29 debug: /auth-api/user/upload/add: {}
exouser-app-1 | 2024-07-03 22:14:29 error: [object Object]
I do see one timeout mentioned in the Caddy documentation, whose default duration is 5 minutes. It's the idle
timeout. I don't know whether it affects us, since we're using Caddy in reverse-proxy
mode; but here it is just in case.
idle
is a duration value that sets the maximum time to wait for the next request when keep-alives
are enabled. Defaults to 5 minutes to help avoid resource exhaustion.I don't see any 5-minute (by default) timeouts in the documentation of the express-fileupload
NPM package used by the React app, although this React app is using old versions of lots of packages, so the docs might not be accurate for the version of the package in use.
@eecavanna @mflynn-lanl is this slated for this sprint? Can this be slotted for the next sprint?
I think @mflynn-lanl and I will discuss this at the squad meeting on Tuesday (I missed last week's meeting) and then update this ticket based on that discussion (with one potential update being to move it to next sprint).
Closing in favor of #289, which mentions this symptom and links to this ticket. I think resolving #289 will eliminate this symptom (and the underlying issue) also.
History
In #218, @mflynn-lanl reported that users could not upload large files. The root cause was that Cloudflare — which, at the time, was configured in "Proxied" mode — was imposing its 500 MiB (Mebibyte) request size limit. Uploads of files smaller than 500 MiB would succeed, while uploads of files larger than 500 MiB would fail. To fix that, we reconfigured Cloudflare to be in "DNS Only" mode instead of "Proxied" mode. As a result, we were able to upload files larger than 500 MiB.
Problem
Although we can now upload files larger than 500 MiB; if the upload takes longer than 5 minutes, the upload fails and the client receives an
HTTP 408 Request Timeout
response.Task 📋
Appendix
You can convert a number of Mebibytes into Bytes at: https://www.dr-lex.be/info-stuff/bytecalc.html
You can generate "dummy" files (of a given size in Bytes) by running this command: