tjanczuk / iisnode

Hosting node.js applications in IIS on Windows
Other
1.85k stars 587 forks source link

Large file download issues #670

Open KanedaShusuke opened 5 years ago

KanedaShusuke commented 5 years ago

When I download a 7GB(7344472786 bytes) file from the node.js-http-server using iisnode, the download stops at 3GB(3049259008 bytes). There is no problem when downloading a 7GB file with a browser from node.js-http-server not using iisnode. Does iisnode support downloading of large files like 7GB?

KanedaShusuke commented 5 years ago

After deleting 'Content-Length' from the Response header, I was able to download a 7GB file.

bluespidernav commented 4 years ago

Hmm, I'm having exactly the same issue (downloading a 10.9Gb file). The download stops at somewhere close to 3Gb. I'm using iisnode, and my file downloads are implemented by a custom script using express and res.download . Files below 2Gb are no problem and support rage requests and are resumable. On serving a larger file the download consistently fails at somewhere in the region of 3Gb but not always at exactly the same position. On observing all server processes the node.exe running interceptor and the download script looks fine during the download but records an EPIPE error as the failure cause. The w3wp.exe process however appears to go into a spin at the moment the failure occurs (it will use all CPU on one core). If you repeat the process without restarting the web server it will go and use another core. Now if you attempt to resume the download (using a range request) at this point then you will receive the http response headers but you will not receive any data. The server will keep its reply socket open however until you give up on it. The script js code does not even get invoked. To me it could be a bug in IISNODE, a bug in the express middleware or something else. It does not appear to be a memory issue as no processes appear to allocate excessive amounts. Does anyone have any clues as to the likely nature of the problem? Resumable downloads are clearly far more necessary for really really large files and less than perfect internet connections so removing Content-Length won't help with that at all!!!!

bluespidernav commented 4 years ago

After a little further experimentation the issue appears to be more likely to be some odd peculiarity of iisnode. Changing the implementation to avoid using express res.download and instead piping the file (stream) to res, along with suitable headers still fails at the same byte offset. Noting also that if you don't supply a Content-Length header you get automatically given a Transfer-Encoding: chunked header instead but you won't get any output unless you actually observe the checked encoding formatting rules!!! Interestingly it may be the case that if you do use chunked encoding correctly that the issue can be circumvented. That said it's not clear if it would ever be possible to get browser downloads working correctly this way. Right now my website is vulnerable as anyone knowing the URL for this large file download (and having suitable login credentials to download it) can bring my webserver down simply by attempting download and waiting for failure a few times in not so rapid succession. Each attempt will leave w3wp.exe using full CPU on one additional core until its maxing out the machine. In fact either a download without a range request when it reaches around 3Gb Or a request to resume at an offset greater or equal to this are both triggering the w3wp.exe SPIN. It does not seem to matter whether express is used (or sendFile) as even with a simple streaming implementation we have a big problem and it seems to be somewhere external to our code. WHAT CAN IT BE?

bluespidernav commented 4 years ago

Further update. After much experimentation and testing.... By correctly using Transfer-Encoding: chunked (e.g no Content-Length) header. The files total size can still be given in the Content-Disposition header by adding a size= after the filename parameter. By reporting that you Accept-Ranges and serving up a 206 (Partial Content) when a valid Range header is given it seems its possible to get the download to work correctly. Obviously using the chunked encoding adds a weeny bit extra overhead (a few extra bytes per 64K - 64K being the default high watermark for Streams in nodeJS) - not clear whether Chrome will correctly resume or not by the file its creating so far looks ok. Further testing in progress! CAN ANYONE EXPLAIN WHY IT ALL GOES COMPLETELY *ITS UP WHEN Content-Length is correctly set to the size of the massive file. It does not appear to be related to IIS request filtering as the maxContentLength which I did incidentally have set at 3Gb was later increased to 16Gb. Unless the IIS request filtering has some sort of issue with values greater than 2Gb !!? Either way running chunked encoding seems to be a solution of sorts. Now what is the PROBLEM really?

bluespidernav commented 4 years ago

Another note... The failure offset for a regular (with Content-Length set) download for a file larger than 4Gb appears to be precisely when the total content output so far exceeds the Content-Length modulo 4Gb. Aha this probably means that something is policing our output to ensure it doesn't exceed our previously specified Content-Length and if it does the failure case isn't very pleasant at all. This probably indicates that this policing is holding the expected Content-Length in a 32 bit unsigned integer. In my case with a 10.9Gb file the failure was close to 3Gb. Wherever this is it needs to be fixed. There is no good reason to limit content length this way and doing so is a violation of RFC specifications.

bluespidernav commented 4 years ago

And for me the next question is should I switch to the azure branch for iisnode - Really this depends on whether this bug remains present in the azure version which I have note tried yet. I'll have to try on a non-production server sometime.