Open plague-doctor opened 7 months ago
That's strange I've never seen this error before.
A few questions to debug this further: Can you try it with another file? Does the error occur exactly at the same position (e.g at 2.3GB) ?
Ok. I've tested it a little bit more.
Thanks for your detailed research. It's difficult to debug this if I can't reproduce it and I can't really find any useful information on the internet. I've created a Docker image that uses the LTS
version of Nodejs, this might fix the issue. Could you try to use the image stonith404/pingvin-share:development
temporarily?
sure thing. Will do more tests with LTS
A bit more investigation: The LTS throws same errors:
[Nest] 41 - 12/14/2023, 12:53:24 AM ERROR [ExpressAdapter] Premature close
Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close
at ServerResponse.onclose (node:internal/streams/end-of-stream:159:30)
at ServerResponse.emit (node:events:526:35)
at emitCloseNT (node:_http_server:1023:10)
at Socket.onServerResponseClose (node:_http_server:278:5)
at Socket.emit (node:events:526:35)
at TCP.<anonymous> (node:net:337:12)
[Nest] 41 - 12/14/2023, 4:24:15 AM ERROR [ExpressAdapter] Premature close
Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close
at ServerResponse.onclose (node:internal/streams/end-of-stream:159:30)
at ServerResponse.emit (node:events:526:35)
at emitCloseNT (node:_http_server:1023:10)
at Socket.onServerResponseClose (node:_http_server:278:5)
at Socket.emit (node:events:526:35)
at TCP.<anonymous> (node:net:337:12)
However, I have tested different browsers this time.
Hm okay, so it only occurs with Firefox. I'll try to investigate the further next week.
Hello,
I'd like to add my 2c to this - I'm having the same problem, downloading an 8.8GB file from my own server. It goes well until it hits 1-1.2GB, then drops the download like a hot potato.
[Nest] 41 - 12/19/2023, 10:06:06 AM ERROR [ExpressAdapter] Premature close
Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close
at ServerResponse.onclose (node:internal/streams/end-of-stream:159:30)
at ServerResponse.emit (node:events:526:35)
at emitCloseNT (node:_http_server:1023:10)
at Socket.onServerResponseClose (node:_http_server:278:5)
at Socket.emit (node:events:526:35)
at TCP.
Firefox on Linux Mint 21.2 and on Windows 10 22H2 both exhibit the same behaviour. I've not tried with any Chromium based browsers.
Strange, I couldn't find anything useful on the internet.
I'm not a Firefox user but did you experience this bug with other downloads from other websites?
I doubt it would be related to Firefox. It is rather NestJS issue. Other projects reports similar problems... Could we log some more to be able to pinpoint the issue? https://github.com/nestjs/nest/issues/12283
Hey!
I may have found something while fiddling around. The site doesn't timeout or cancel the download on Firefox while on HTTP, although the issue remains via HTTPS... sadly it doesn't mean Pingvin container logs are fault-free, the same fault remains al be it less often.
The next thing I tested was accessing the site via HTTPS and the App URL is set to host ip and port. Unfortunately, it resulted in the same fault where the download was canceled.
Still kinda pointing back to NestJs as @plague-doctor said.
Anyway is there a workaround or known working version?
If I have some time, I'll create a bug report in the NestJS repo with a small reproduction. Besides that I'm not really sure what I can do.
Did the download of large files work before on Firefox?
Did the download of large files work before on Firefox?
yes, it did. was able to download a 2 GB file. now any file bigger than 50 MB it craps out. Side note, there were no Nginx config changes on my end that would have broken the side.
Which is unfortunate. Anyway, thank you for your time so far!
@sebasdt Can you remember approximately when downloading large files have stopped working?
The annoying part is I do not know at what version pingvin broke. (My backups don't go back that far and the last monthly december was just purged.)
The last known working state was back in 9 Dec, Although it could be later.
This probably doesn't help much, but I tested it out of interest. I uploaded a 5GB file, then tested the following scenarions:
I don't know if this helps in any way and if it's expected behaviour for this particular bug since I don't know how nestjs works, but thought I'd still post my findings.
I've just tested older versions of pingvin-share, and I can tell that the issue starts happening with v0.21.1, while v0.21.0 seems to be working as expected.
Thanks @axiomen for your observations!
Could you guys test version v0.21.0
too? There aren't any db migrations since then so you only have to change the Docker image tag to v0.21.0
.
I've tried all versions from and including 0.21.0 through latest and they all exhibit the same behaviour, i.e. dropping the download at around 1GB data transferred.
That is strange. I have tested multiple large files, with single and multiple files per share and there was no issue on v0.21.0, while on v0.21.1 it happened immediatelly. I can re-test later today on another machine.
I've tried all versions from and including 0.21.0 through latest and they all exhibit the same behaviour, i.e. dropping the download at around 1GB data transferred.
I've only tried 0.21.0 and am still only able to download on http.... I will be trying out different versions and update you all.
Man, what is annoying is that we all have a common problem but the symptoms are all kinda different.
Yeah.. I've poked at this for a bit to see if my environment was the culprit. I'm testing pingvin-share as a docker container on unRAID, and have it sitting behind an Nginx Proxy Manager reverse proxy which handles the TLS termination and passes the request through. NPM is also a docker container.
Thinking that was my issue, I spun up a Debian server and set up pingvin-share there, still routing it through NPM, and still the same problem. I then exposed the Debian host for pingvin-share directly on the internet - no dice.
I still saw the same errors in the logs and concluded that the environment is not the issue; something within the pingvin-share container must be wrong, but that's where my debugging skills end.
As I stated in my previous entry here, I've tried all versions from and including 0.21.0 through current, and they all do the same thing. I'll give 0.19.x a go and see if that has the same issues and report back.
Tested on my second machine on v0.21.0, and the error is back. I have no idea how on my first machine the error consistently happens on v0.21.1+, but on v0.21.0 doesn't even after testing multiple scenarios with 1GB+ files. Both are containers spun on docker. Honestly at a loss here.
Thank you guys for helping to debug this. It's frustrating as I can't really do much because it must be caused by the StreamableFile
class of NestJS. And on the internet there is nothing that is related to this issue. I just upgraded all packages to the newest version, would you mind to test the stonith404/pingvin-share:development
image?
If this doesn't work, I have to create an issue in the NestJS repository.
No Premature close error on my first machine, but I did get another error upon finishing the download.
On my second machine, Premature close error is still present even on the dev image.
On the development tag version, I was able to download one 3.5GB file and one 5.1GB file, but the zip file containing the two, with a total size of 8.8GB, failed after 1,138,822KB (1.13GB).
Retrying the first 3.5GB, it failed at 1.9GB:
[Nest] 41 - 01/11/2024, 2:35:23 PM ERROR [StreamableFile] Premature close
Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close
at ServerResponse.onclose (node:internal/streams/end-of-stream:159:30)
at ServerResponse.emit (node:events:526:35)
at emitCloseNT (node:_http_server:1023:10)
at Socket.onServerResponseClose (node:_http_server:278:5)
at Socket.emit (node:events:526:35)
at TCP.
I've noticed a few other things as well, but I'll open separate tickets for those.
Okay thanks. I think the only option now is to report it to NestJS.
I did some more testing, and noticed that on Chrome the error happens if multiple simulatneous downloads from Pingvin Share are running. For instance, if I'm downloading three files from the same share (file1, file2, and zip of file1 and file2), I'll get Premature close error, but not all downloads are cut off (file1 and file2 stop downloading, while the zip continues to download).
So I debugged this issue a bit further and I found out that the backend logs Premature close error
if the the browser cancels the download. My guess is that the issue occurs when the network is unstable and the client devices gets disconnected for a short time.
For example I'm using a VPN and I was able to reproduce this issue with the VPN but if I disable the VPN the download works. My VPN isn't really reliable because sometimes if I visit a website I get the error "Network has changed".
Could this be the issue or are you guys sure that your connection on the server and client are stable?
I tested it with 2 friends, both with fiber connection and near 5km radius. I'm using pingvin-share behind SWAG reverse-proxy. Download All
button seems to throw Premature Close error more often on a share with multiple files, than downloading the files separately. Downloading single files have been giving variable success rate, sometimes it downloads it fully, sometimes it fails the file download with Premature close error.
@fosq Thanks for taking the time for helping to debug this. Would you mind to test if the issue also occurs when you download the file over your local network, without a proxy in front?
I don't think that's the issue. I've tested it both remotely (host in one location, client in another) and locally (host and client on the same network). Actually, most of my tests have been run locally.
So, I just rewrote the download endpoint completely and replaced the StreamableFile
with the download
method of Express. I've tested the download with a VPN, over the local network, over the internet, and concurrent downloads. Would you mind testing the stonith404/pingvin-share:development
image again? 🙏
Tested on two hosts. Unfortunately, still the same probelm.
Host #1: remote over HTTPS, three simultaneous downloads, all failed at a certain point.
Host #2: local network over HTTP, three (different) simultaneous downloads, all but one failed.
Tested on Firefox.
I did some digging that might or might not be relevant, but thought I'd share. I opened Network Monitor in Firefox, started downloading the files and waited for them to fail. Upon failing, I got and entry with Status: Blocked, NS_ERROR_NET_PARTIAL_TRANSFER. Through some googling, it seems that this is more of an Nginx issue, than something else, but I might be wrong.
@axiomen Thanks for your research. I made some researches too and it's extremely strange. You're right that the issue still occurs because I only tested it with a chromium based browser. Downloading files in a chromium based browser concurrently works with no problems for me. Downloading one file at the same time in Firefox works for me too, but not multiple files at the same time. If I start the download in Chrome first and then start the download in Firefox, the Firefox download fails, but if I start the download in Firefox first the files get downloaded successfully in both browsers.
it seems that this is more of an Nginx issue
You said that you tested the download over the local network, was there Nginx in front too?
Sorry, I was in a rush, so I didn't elaborate on my comment. From my understanding, Docker image comes with bundled Nginx, correct? If not, disregard that.
This is what I've noticed the process usually goes like: Firefox shows the download is still in progress, even though the transfer shows no activity anymore (i.e. 0 MB/s) -> Network Monitor reports "Blocked" -> Pingvin share throws a "Request aborted" / "Premature close" error.
Edit:
Found what the mismatch is. Linux host and the browsers seem to report in GiB, while Pingvin share web interface reports in GB.
Okay, further investigation has led me to something interesting (again, might not be relted at all). There is a mismatch between reported file size on Pingvin Share interface vs the one that the browsers report (valid for both Chrome and Firefox).
Which has led me to this issue. Relevant parts:
I've looked at the response headers when the file's downloading and all is in order, until I get this NS_ERROR_NET_PARTIAL_TRANSFER message and it fails.
I have an issue with FireFox where it can't download generated private file PDFs, yet Chrome has no such issue.
The real issue is the mismatch between the file's real size on the server and the size Drupal stores in the file_managed table, filesize column.
I did check the content-length in the HTTP Headers when the transfer fails, which reports the correct size, so I'm not really sure where the mismatch happens.
@stonith404 I got it working, but feel like it's more of a workaround?
Mounted nginx.conf to the host and added "proxy_buffering off", so it looks like this:
location /api {
proxy_buffering off;
proxy_pass http://localhost:8080;
proxy_set_header X-Forwarded-Host $host:$server_port;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
I'm not sure what exactly is going on, and from my understanding this is not the best practice, but it works now.
Docker image comes with bundled Nginx, correct?
It seems like it yes. Even the bare install without docker uses Nginx. Is it possible to temporarily replace Nginx with Caddy or Apache? just to see if nginx is apart of the problem.
I just replaced Nginx with Caddy in the latest development image. For me the error strangely still occurs if I disable proxy buffering or use Caddy. Would you mind to test the image again?
@axiomen Is the error completely gone if you disable proxy buffering?
I've only tested locally over HTTP and I'm consistently able to have multiple concurrent downloads all finish without an error - on both dev images.
Doesn't seem to be the case for remote HTTPS transfers, but I need to adjust my host's nginx configuration for further testing. I might be able to do it later in the day or sometime this week. However, on a quick test with the latest dev image, I'm getting an additional error before Error: Request aborted
:
{"level":"error","ts":1706517723.419075,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","upstream":"localhost:8080","duration":0.019631783,"request":{"remote_ip":"172.18.0.1","remote_port":"42936","client_ip":"172.18.0.1","proto":"HTTP/1.1","method":"GET","host":"pingvin.mydomain.com","uri":"/api/shares/E55QMTR/files/edb72daf-e139-48cb-bde1-f795d896fea4","headers":{"Sec-Fetch-Dest":["document"],"X-Forwarded-Proto":["http"],"X-Forwarded-Ssl":["on"],"X-Forwarded-Port":["443"],"X-Forwarded-Method":["GET"],"X-Forwarded-For":["172.18.0.1"],"X-Forwarded-Host":["pingvin.mydomain.com"],"Sec-Fetch-User":["?1"],"X-Original-Method":["GET"],"Sec-Gpc":["1"],"X-Forwarded-Uri":["/api/shares/E55QMTR/files/edb72daf-e139-48cb-bde1-f795d896fea4"],"Accept-Encoding":["gzip, deflate, br"],"X-Real-Ip":["REDACTED"],"Dnt":["1"],"X-Forwarded-Server":["pingvin.mydomain.com"],"Cookie":[],"Accept-Language":["en,en-US;q=0.7,de;q=0.3"],"User-Agent":["Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0"],"Sec-Fetch-Site":["same-origin"],"Referer":["https://pingvin.mydomain.com/share/E55QMTR"],"Upgrade-Insecure-Requests":["1"],"X-Original-Url":["https://pingvin.mydomain.com/api/shares/E55QMTR/files/edb72daf-e139-48cb-bde1-f795d896fea4"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8"],"Sec-Fetch-Mode":["navigate"]}},"error":"writing: write tcp 172.18.0.8:3000->172.18.0.1:42936: write: broken pipe"}
I omitted sensitive data.
I managed to do a little bit of testing over HTTPS (latest image, slightly adjusted nginx on my host), but on my phone with Firefox Android using mobile data. Ran one test with two concurrent downloads with no errors. I'm starting to think there's something up with the firewall on the network I've done all (or most of) my testings on so far. Once the downloads fail, I noticed I cannot access anything on my domain on this network for a while, however everything remains accessible through my phone data.
Edit:
I'm stumped. I did some more testing, and I'm not even sure where to look anymore. It's not the firewall as I suspected. Concurrent downloads work perfectly fine over HTTP (no nginx on the host) both locally and remotely (over VPN) and so do local transfers over HTTPS (with nginx on the host), but remotely they just fail at random points over HTTPS (on desktop Firefox, on Android Firefox the single test I ran was successful).
I was able to reproduce the issue with the following minimal reproduction snippet. NestJS uses Express, so I've written the reproduction snippet with Express.
import express from 'express';
const app = express();
app.get('/', (req, res) => {
// Send the file using res.download
res.download('large-file.txt', (err) => {
if (err) {
console.error(err);
}
});
});
app.listen(8080, () => console.log("Server is running on port 8080"));
I'm actively debugging this but I didn't find the problem yet. It seems like the issue doesn't occur if nothing is in front like Nginx, Caddy or a Cloudflare zero trust tunnel. But locally it works also if a reverse proxy is in front.
But I'm so confused where the bottleneck is because the issue only occurs on Firefox, if https is enabled, something is in front and over the internet (not locally)...
I did a quick test, and it's not only on Firefox under the aformentioned conditions. Chrome is also affected.
A rather strange thing I've noticed is my domain becomes unreachable for a minute or so after the downloads fail. Containers keep running, so does swag (nginx), and there are no error logs indicating something's wrong. For good measure, I tested file transfer over my Nextcloud instance (in case there's some domain hosting issue), and that worked fine.
Hi!
Well I just gave the latest dev build a look, It seems like I got a different error.. That might just be because im on a dev built.
Pingvin is currently behind my Rproxy and downloading at max network bandwidth on firefox. Sadly the download didn't finish. (Edit: the file download was able to finish after a retry, maybe its something else thats buggin'out?)
Here is also a screenshot of the download.
pingvin | {"level":"error","ts":1706993576.958401,"logger":"http.handlers.reverse_proxy","msg":"aborting with incomplete response","upstream":"localhost:8080","duration":0.019484589,"request":{"remote_ip":"192.168.20.10","remote_port":"44098","client_ip":"192.168.20.10","proto":"HTTP/1.1","method":"GET","host":"share.smallbrain.nl","uri":"/api/shares/myfiles/files/e2ce-439a-919a-5292fbc944de","headers":{"Dnt":["1"],"X-Forwarded-Proto":["http"],"Accept-Encoding":["gzip, deflate, br"],"X-Real-Ip":["10.59.59.2"],"X-Forwarded-For":["192.168.20.10"],"User-Agent":["Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0"],"Sec-Fetch-Mode":["same-origin"],"Sec-Fetch-Site":["same-origin"],"X-Forwarded-Host":["share.smallbrain.nl"],"Upgrade-Insecure-Requests":["1"],"Referer":["https://share.smallbrain.nl/share/mine"],"Cookie":[],"Accept-Language":["en-US,en;q=0.5"],"Sec-Fetch-Dest":["empty"],"Accept":["text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8"]}},"error":"writing: write tcp 172.18.0.2:3000->192.168.20.10:44098: write: connection reset by peer"}
pingvin | Error: Request aborted
pingvin | at onaborted (/opt/app/backend/node_modules/express/lib/response.js:1052:15)
pingvin | at onfinish (/opt/app/backend/node_modules/express/lib/response.js:1088:50)
pingvin | at AsyncResource.runInAsyncScope (node:async_hooks:206:9)
pingvin | at listener (/opt/app/backend/node_modules/on-finished/index.js:170:15)
pingvin | at onFinish (/opt/app/backend/node_modules/on-finished/index.js:101:5)
pingvin | at callback (/opt/app/backend/node_modules/ee-first/index.js:55:10)
pingvin | at Socket.onevent (/opt/app/backend/node_modules/ee-first/index.js:93:5)
pingvin | at Socket.emit (node:events:530:35)
pingvin | at emitErrorNT (node:internal/streams/destroy:169:8)
pingvin | at emitErrorCloseNT (node:internal/streams/destroy:128:3) {
pingvin | code: 'ECONNABORTED'
pingvin | }
Here are some lines of logs from my Rproxy:
- - [03/Feb/2024:23:24:47 +0100] "GET /api/shares/myfiles/files/Myfiles HTTP/2.0" 200 1217261123 "share.smallbrain.nl" "30.032""https://share.smallbrain.nl/share/Mine" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0"
- - [03/Feb/2024:23:24:49 +0100] "GET /sw.js HTTP/2.0" 200 2198 "delen.smallbrains.nl" "0.004""-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0"
- - [03/Feb/2024:23:24:49 +0100] "GET /workbox-b471fa0c.js HTTP/2.0" 200 5446 "share.smallbrain.nl" "0.000""-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:122.0) Gecko/20100101 Firefox/122.0"
Hope this could help.
Thanks @axiomen and @sebasdt. This issue is really difficult the fix, because I'm still not sure where the problem is. But I'll keep you up to date.
(Tested on 0.21.5 dev image.)
I looked a bit more through the swag
logs, including fail2ban
module, and also Network
stats in Firefox, and this popped out for me.
As you can see, I'm getting quite a few 401
errors with Slow server response time
warning while downloading, which ultimately ends in fail2ban
temporarily blocking the IP address, and failing the transfer.
Edit:
@stonith404 I disabled the nginx-unathorized
jail in fail2ban
module, and the transfers successfully complete without errors. This only happens if the transfer is ran as a guest user. Once logged in as a registered user, there are, expectedly, no 401
errors in the browser console. The question is, why does the server keep trying to authorize the guest?
All these errors happened during a transfer as an unregistered/guest/anonymous user.
@axiomen Thanks for participating in #401 let's discuss related things there, instead of here.
Did disabling the fail2ban jail really fix the issue? I'm not using fail2ban and the issue still occurs. I still have no clue where this issue occurs as the symptoms are different for every user. For example you said that the issue also occurs in Chrome but I made many tests and it never failed for me in Chrome.
@stonith404 I've run multiple tests, upload + download of two 4-5 GB files, and all tests passed. That's on the dev image with Caddy. As long as I disable nginx-unauthorized
fail2ban jail OR log in while downloading, the dev image works as expected.
I did stumble on another issue with this image though, where editing a share and attempting to add a file results in an upload error and the share disappearing completely on the web interface. The share is still present on the disk, however. This is the error:
[Nest] 49 - 02/12/2024, 10:08:38 AM ERROR [ExceptionsHandler] request entity too large
PayloadTooLargeError: request entity too large
at readStream (/opt/app/backend/node_modules/raw-body/index.js:163:17)
at getRawBody (/opt/app/backend/node_modules/raw-body/index.js:116:12)
at read (/opt/app/backend/node_modules/body-parser/lib/read.js:79:3)
at rawParser (/opt/app/backend/node_modules/body-parser/lib/types/raw.js:81:5)
at Layer.handle [as handle_request] (/opt/app/backend/node_modules/express/lib/router/layer.js:95:5)
at trim_prefix (/opt/app/backend/node_modules/express/lib/router/index.js:328:13)
at /opt/app/backend/node_modules/express/lib/router/index.js:286:9
at Function.process_params (/opt/app/backend/node_modules/express/lib/router/index.js:346:12)
at next (/opt/app/backend/node_modules/express/lib/router/index.js:280:10)
at expressInit (/opt/app/backend/node_modules/express/lib/middleware/init.js:40:5)
@axiomen Okay thanks for your feedback. I'll test it ASAP.
👟 Reproduction steps
When downloading large file (couple of GB) the download fails after few minutes. The docker throws:
Sometimes this happens several times before the file if finally downloaded.
👍 Expected behavior
Successful download regardless the file size.
👎 Actual Behavior
As described above.
🌐 Browser
Firefox, Chromium, Brave