uNetworking / uWebSockets.js

μWebSockets for Node.js back-ends :metal:
Apache License 2.0
7.76k stars 563 forks source link

the first onData((chunk, isLast) => {}) call always has 0 size chunk #910

Closed e3dio closed 1 year ago

e3dio commented 1 year ago

Is this normal for onData() to always have zero size chunk on first call? Seems like onData() should only call when it has chunk of data. Otherwise I need to add code to check and handle zero size chunk at start, to postpone creating buffer to hold data. Tested local and across network etc

uNetworkingAB commented 1 year ago

The lib will always give you an indication of "isLast". So if you attached onData and there is no data, you will get an empty chunk with the "isLast" true. Otherwise, attached behavior would leak. The "isLast" is essentially your destructor.

It makes no difference if you have 0 size chunks or not, they hold no data. There is nothing forcing you to allocate memory to hold a 0 size chunk.

e3dio commented 1 year ago

That's what I was thinking it could possibly by useful to know the isLast before data comes in, in case there is no more data. I just need to update my getBody function

uNetworkingAB commented 1 year ago

@NexaraDev you have reported similar issue regarding data:

But, whenever I sent a request with empty body or no content, the request get timeout.

I cannot trigger any such behavior with Upload.js example:

This works:

curl --verbose -X POST -d "hello" http://localhost:9001

And this works:

curl --verbose -X POST http://localhost:9001

uNetworkingAB commented 1 year ago

know the isLast before data comes in

But if isLast is false, then there is more data to come. We only know for sure that there is no more data when we get last chunk, for instance chunked encoding does not tell you when there will be no more data, it just tells you that NOW there will be no more data.

If you want some kind of shortcut (which I don't really see as a shortcut) you can check if content-length is explicitly set to 0 and apply some shortcut then, but that sounds to achieve nothing good - attaching a callback should not be a heavy operation.

uNetworkingAB commented 1 year ago

If you look at the JsonPost.js example, there is info for how to receive json without any allocations more than necessary: https://github.com/uNetworking/uWebSockets.js/blob/master/examples/JsonPost.js

uNetworkingAB commented 1 year ago

One major flaw of JS is that Promises execute from a queued up microtask queue, rather than as an immediate function call from where the completion happened. So any stack allocated data that wants to be immediately exposed to some completion handler, must be dynamically allocated and copied and then freed. So callbacks are massively superior to Promises for performance stuff.

The onData callback can deliver data zero-copy to a consumer without any allocations or copies at all (which is shown in JsonPost.js example). But when you "promisify" it, you can't really do zero-allocations, zero-copy anymore

e3dio commented 1 year ago

The issue with that getJson example is this line buffer = Buffer.concat([buffer, chunk]) concat creates new Buffer and copies existing data into it + new data on every onData event, instead of creating 1 buffer at start at the needed size. The more onData events it has the amount of copying grows exponentially. If you have 1 buffer to start you only copy once per chunk

uNetworkingAB commented 1 year ago

Yep but that's how Http is. You don't know the exact size unless content-length is written. We could add a function "getRemainingBodySize" or similar where it will be either set to some byte length or missing if not known.

Then you can allocate according to it, if available

uNetworkingAB commented 1 year ago

HttpRequest.getSmallestBodyContainer() can return 0 for no body, INF for unknown, potentially unlimited chunked encodinbg, or content-length for content-length.

Then you can check if getSmallestBodyContainer is some sane value, and pre-allocate a file or memory

e3dio commented 1 year ago

The Content-Length header is mandatory for messages with entity bodies, unless the message is transported using chunked encoding

Content-length header is required, it should be used to prepare by allocating buffer, or deny request if too large or over ratelimit etc, instead of exponentially copying data on every chunk into new buffers

uNetworkingAB commented 1 year ago

content-length is not required. chunked encoding has conceptually infinite size. So the function should return 0, INF, or size and you only need to check it once, at the very start. Then you can subtract all chunk sizes from it

e3dio commented 1 year ago

A user agent SHOULD send Content-Length https://www.rfc-editor.org/rfc/rfc9110.html#section-8.6

So it SHOULD send, well on my server content-length is required so I respond with 411 Length Required

"411 Length Required" client error response code indicates that the server refuses to accept the request without a defined Content-Length header

If you don't want to provide content-length you can find another server to send request to ;)

uNetworkingAB commented 1 year ago

You didn't even quote it fully:

A user agent SHOULD send Content-Length in a request when the method defines a meaning for enclosed content and it is not sending Transfer-Encoding.

In case of chunked encoding, you never know the size of the body up front. And looking at this, we don't even need getSmallestBodyContainer since you already can check getHeader("content-length") and pre-allocate if you want to. But you still need to handle a missing content-length since it is not even allowed when chunked transfer

e3dio commented 1 year ago

Ok but you are talking about chunked encoding, nobody sends JSON in chunked encoding. We are talking about standard POST with content-length known. Only very few cases use chunked encoding where size is unknown, in that case you will want specialized route for that anyway. You will probably want to stream that data somewhere and not gather it into a buffer etc

uNetworkingAB commented 1 year ago

https://github.com/uNetworking/uWebSockets/blob/46a3ef29ae8aa7ba96fd6c0e6b891221d41b8ef6/src/HttpParser.h#L449

Those are the rules for body

uNetworkingAB commented 1 year ago

if (transfer-encoding) then body size = INF if (content-length) then body size = some fixed number

if the above is less than \<ridiculous limit> then pre-buffer it

uNetworkingAB commented 1 year ago

nobody sends JSON in chunked encoding

Well, we implement standards compliant HTTP, so can't ignore a major part of how body size is conveyed. But you are definitely right that if you have content-length known, then use it to pre-allocate is a good idea