Streaming big files causes random chunk duplication

amir-s commented 1 month ago

👋

While creating a hobby project with hyper-express I noticed when I'm streaming a file to the response, the checksum of the received response doesn't match the original file. The chance of this happening gets higher with bigger files. I can consistently reproduce with a file around 256M. The response seems to have some duplicated parts.

webserver.get("/", (request, response) => {
  const stream = fs.createReadStream("./sample.txt");
  stream.pipe(response);
});

I also tried reading streaming file chunk by chunk and writing it into the response res.write(chunk) while handling backpressure, but that did not change anything. The bug still appears. After switching to the node's builtin http server, this was no longer an issue. that's why I suspect something is going on with hyper-express.

I made this little repo to help out reproducing this https://github.com/amir-s/debug-hyper-express You can run npm install and npm run generate to generate the sample.txt file.

❯ shasum sample.txt 
4005870368b86b280241605852047f4f390f7b11  sample.txt

Then you can start the server npm start and download the file via curl. I ran it a few times and each time the size of the downloaded file is different, so as the checksum.

❯ curl localhost:3000 -o downloaded.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  271M    0  271M    0     0  1583M      0 --:--:-- --:--:-- --:--:-- 1623M

❯ shasum downloaded.txt sample.txt     
740c6f2555c822eb0c706608ab60a3c42f221624  downloaded.txt
4005870368b86b280241605852047f4f390f7b11  sample.txt

❯ curl localhost:3000 -o downloaded.txt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  256M    0  256M    0     0  1394M      0 --:--:-- --:--:-- --:--:-- 1443M

❯ shasum downloaded.txt sample.txt     
a6426e5f5ef80cf631d467250f127f7c7a5720b0  downloaded.txt
4005870368b86b280241605852047f4f390f7b11  sample.txt

I'm using node v20.14.0.

kartikk221 commented 1 month ago

I see, I'll begin to look into this further but It seems to be a bug which would have to do with backpressure handling. The problem is I can't seem to pinpoint any logical bug in the backpressure handling. Will try and do more gradual debugging to see If its a bug within uWebsockets.js itself.

amir-s commented 1 month ago

@kartikk221 Thank you!

I did a little digging and it appears the chunks that cause the backpressure get sent to the client twice. The logic for handling backpressure looks fine to me as far as I understand it.

https://github.com/kartikk221/hyper-express/blob/7c2544b6ade33b537c06983da061c30a4ea6ab2b/src/components/http/Response.js#L619-L633

However the offset is always 0 for whatever reason. But from what actually happens, the chunk has been fully received by the client when the buffer is drained. Essentially If I manually force offset = chunk.length; right before const remaining = chunk.slice(offset - write_offset);, things would work.

Where does the offset here come from? https://github.com/kartikk221/hyper-express/blob/7c2544b6ade33b537c06983da061c30a4ea6ab2b/src/components/http/Response.js#L398-L412

Is it from uWebsockets.js?

kartikk221 commented 1 month ago

Yes, the offset argument comes from uWebsockets.js as it is supposed to tell you how many bytes of the original chunk were written. At least that is how I understand it which is why I don't really notice a problem in the backpressure / retry logic.

amir-s commented 1 month ago

Could this be related? https://github.com/uNetworking/uWebSockets.js/issues/726#issuecomment-1083081077

kartikk221 commented 1 month ago

Could this be related? uNetworking/uWebSockets.js#726 (comment)

That would explain it actually, so we shouldn't use the offset from the onWritable callback since the write call will buffer and write the whole chunk even if it returns false due to backpressure.

One thing that I am confused about in https://github.com/uNetworking/uWebSockets.js/issues/726#issuecomment-1083081077 is where ghost mentions "(but do not use offset, use your offset)".

What does he potentially mean by use our own offset? Does he just mean serve the next chunk?

This also begs another question which is what the boolean returned to the onWritable callback is supposed to mean? Because If we are not supposed retry the failed chunk even partially then should we simply continue execution and return true anyways without writing anything within the onWritable / drain itself?

amir-s commented 1 month ago

I have the same question! I guess at this point when backpressure happens, the only thing we need to handle is to just wait for the drain and then continue. Something like

if (sent) {
    // The chunk was fully sent, we can resolve the promise
    resolve();
} else {
    this.drain(() => {
       resolve();
       return true;
    });
}

The return true part feels like cheating a little!

amir-s commented 1 month ago

According to https://github.com/uNetworking/uWebSockets.js/issues/726#issuecomment-1083086046 tryWrite does not buffer if it fails. In contrast write does buffer if it fails. Since hyper-express uses write:

https://github.com/kartikk221/hyper-express/blob/7c2544b6ade33b537c06983da061c30a4ea6ab2b/src/components/http/Response.js#L575

I guess we can be sure it buffers the bites that are not sent.

Another solution would have been to use tryWrite but according to ghost (and I also double checked) it has not been implemented!

kartikk221 commented 1 month ago

I have the same question!

I guess at this point when backpressure happens, the only thing we need to handle is to just wait for the drain and then continue.

Something like
if (sent) {

    // The chunk was fully sent, we can resolve the promise

    resolve();

} else {

    this.drain(() => {

       resolve();

       return true;

    });

}
The return true part feels like cheating a little!

Can you try modifying your local hyper-express source code in the node_modules folder to see If your issue is resolved with your fix?

amir-s commented 1 month ago

I did and it fixed my issue. But I fear it's gonna break some other stuff. In my case total_size is undefined since it is optional. But I see here: https://github.com/kartikk221/hyper-express/blob/7c2544b6ade33b537c06983da061c30a4ea6ab2b/src/components/http/Response.js#L555-L572

If total_size is provided, we'd use tryEnd which then relies on us handling the offset to resend the chunks when backpressure happens.

I think we can also switch to end instead of tryEnd to make sure all backpressure handling logic assumes the chunks that cause backpressure are handled by uWS buffer. What do you think?

amir-s commented 1 month ago

Actually .end does not quite work. The tests fail in my PR.

/Users/amir/src/github.com/amir-s/hyper-express/src/components/http/Response.js:570
            const [ok, done] = this._raw_response.end(chunk, total_size);
                               ^

TypeError: object is not iterable (cannot read property Symbol(Symbol.iterator))

kartikk221 commented 1 month ago

No, you cannot use .end() since .end() is meant to be the final call to end the response. When there is a total_size provided, we have to use tryEnd. Why don't you try this? Only use the offset / partial chunk writes If we are doing tryEnd else just do the behavior we are experimenting with where we assume .write() will buffer and write the chunk fully.

amir-s commented 1 month ago

If that's the case I think the simplest solution would be to resolve the promise and early return when total_size is not provided. We already have a condition we can re-use. I updated the draft PR https://github.com/kartikk221/hyper-express/pull/289. What do you think?

kartikk221 commented 1 month ago

If that's the case I think the simplest solution would be to resolve the promise and early return when total_size is not provided. We already have a condition we can re-use. I updated the draft PR #289. What do you think?

Could you test both your own use case and also the test suite to see If your changes pass? I'll do the same on my end and make any refinements to your PR as needed.

amir-s commented 1 month ago

I tried the fix streaming a large file with and without providing the total size. Without my PR, things only work properly when total size is provided. But after the fix, both cases work fine.

I tried providing the total size via res.stream(fileStream, size). I can see it goes through this._raw_response.tryEnd(chunk, total_size); and offset also gets the proper values.

I can also not provide the total size in the same scenario and things still work properly when the fix is applied.

I'd love to contribute and add couple of tests too, but it's a little hard to navigate the tests. Any pointers?

amir-s commented 2 weeks ago

bump! @kartikk221

kartikk221 / hyper-express

Streaming big files causes random chunk duplication #288