Closed uasan closed 3 years ago
This is not applicable for us, we already know how to do things correctly. And it still seems they don't really have any real idea what they are doing.
I agree that there is almost no sense for us.
But for example, when a file is uploaded, if the file size is 10Mb, and the transfer will be in 10 iterations of onData, then because of Buffer.concat
we will spend 55 Mb (9 chunks copies = 45 Mb)
@uasan But you are suggesting for your example to allocate at the start 10Mb buffer size for every request to your endpoint, request could cancel before completes, onData only uses the ram required at the time. Maybe for large sizes should write to disk instead buffer in ram
@e3dio No, I suggest for each connection (accept socket) to define a function buffer or static object (this is less flexible) uWS always writes the contents of the socket to the buffer that I prepared in advance.
If we know the value HTTP header content-length
, we can prepare a buffer of the required size and we will write to it, we do not need to copy
Resizable and Growable ArrayBuffer, will appear soon in V8, and then it will become even more convenient to use static ArrayBuffers for each socket https://github.com/tc39/proposal-resizablearraybuffer
The buffer function gives flexibility, we can slice subarray from a static buffer, its sizes will depend on the request, for the upload of a file it is enough to slice by one chunk and save it to a file.
My main point is that we can reuse memory, for example, 1k RPS JSON (one request body 1Kb) creates 1MB of garbage in 1 second, this is not critical, but it makes no sense )
1KB Json string only needs 1 chunk and is zero copy, JSON.parse(Buffer.from(ab).toString())
or JSON.parse(await req.body)
Yes this is an ideal case when the whole body is transferred in one call, but for a static buffer it will work just as well.
Http is a stream, there is no guarantee you'll know the total size of the body up front. That's why the interface is giving you chunks and whether or not the chunk is last.
What you do with these chunks is up to you, nobody forces you to use Buffer.concat. You can use whatever you think is best.
If you have a content-length header you can pre-allocate a buffer and fill it. That's already possible today.
What you do with these chunks is up to you, nobody forces you to use Buffer.concat. You can use whatever you think is best.
Yes, you are right That's what I do, I have a single static buffer for small slices in pool and I reuse them.
I closed the ticket, but I am writing thoughts here, maybe someone will be useful.
My main point is that we can reuse memory, for example, 1k RPS JSON (one request body 1Kb) creates 1MB of garbage in 1 second, this is not critical, but it makes no sense )
This is not true. You are given an ArrayBuffer that is zero copy, a reference. There is no garbage collection in this library and your links to discussions in Node.js camp are not applicable. Inspiration and advice on performance is worth less than the dog shit under your shoe, if it comes from the Node.js people. These people have no vision, no idea about anything.
These aspects, memory management, are architectural and will not change. How it works now, is how it will remain.
Copies of buffers are made at the Node.js level of the application, I'm sure everything is fine on your part )
In the documentation and examples, Buffer.concat
is always used in onData
.
Buffer.concat is always used in onData
No Buffer.concat is only used when needed or wanted. Single chunk posts like your 1KB Json example are zero copy JSON.parse(Buffer.from(ab).toString())
. If you are using concat for that you are making unnecessary copy of data. If you want to pre-allocate buffer you can use Buffer.allocUnsafe() and Buffer.from(arrayBuffer).copy(allocatedBuffer,...) but for single chunk this is also unnecessary copy
Buffer.allocUnsafe - this works in half, it will allocate a buffer slice from the shared pool, but there is no interface to free this buffer slice, they just accumulate and wait for the GC to delete them, this is not effective
Buffer.allocUnsafe - this works in half, it will allocate a buffer slice from the shared pool, but there is no interface to free this buffer slice, they just accumulate and wait for the GC to delete them, this is not effective
Well..... yes, that's what it means to be a GC'd language... GC languages use.... garbage collection. And yes, that sucks if you know what you're doing. But most programmers don't, and that's why GC works for most programmers.
If you have any kind of vision or idea of what you're actually doing, don't program in a GC language.
If you have any kind of vision or idea of what you're actually doing, don't program in a GC language.
I admire solutions that take better from two worlds (manual memory and GC), the buffer function tries to do this, but the use case is small, these are endpoints that receive data in more than one call onData and should not dump data to a file
the buffer function tries to do this
You're not really listening to the explanation. uWS does not allocate a new Buffer for every chunk. So whatever they are doing, is not applicable here. We've been zero copy through the entire stack, all the way up to JavaScript, since day 1.
If you want to fill one single ArrayBuffer with all the data of all chunks, you can pre-allocate it, then pass it along and append to it until done. That's definitely going to outperform anything Node.js is doing.
I understand that very well. It is about copying that is Node user app implement in onData handlers, usually by the Buffer.concat method. My suggestion is to give users another simple interface, accumulating chunks without copying.
There are no complaints about C++ runtime.
So you want pretty much onCompleteData where the user is given the full body in memory? That's something you can build atop, but it kind of stimulates poor solutions as you'll fill your RAM with bodies
Yes, not everyone needs this onCompleteData
, in my application there are use cases, a large JSON body, and an endpoint that receives binary data for send to (encode HEX) database, but such cases are not frequent, this needs to be used wisely
but it kind of stimulates poor solutions as you'll fill your RAM with bodies
If you make a survey who has such lines in the code
onData (chunk => {... copies = Buffer.concat([copies, chunk]) ...})
I think most will say that they have it. This means that the current practice is even worse than if they used the new onCompleteDate handler, because
.... Or.... People can learn how to write better code and do it themselves. Like I've said three times now; none of this requires any changes to uWS. You can achieve this by writing a simple NPM module, doing this.
Call it, hmmm, uws-getfullbody or something and make it a Promise or something.
the new onCompleteDate handler, because
- You have zero copies
You will have exactly the same amount of copies doing this inside uWS or outside as JavaScript. Doing it inside uWS gives, like said 4 times now, zero gain.
You destroy buffer at the next tick
No. That's definitely not how it works. The buffer would still be under regular GC. Just like anything written in JavaScript atop.
Less call and boilerplate code in JS
Yes. But if you really have an issue writing 10 lines of JavaScript, doing a simple buffer append on your own (which in itself is horrible practise) then you're not really displaying enough interest in uWS. The customers I work for / at use uWS to create solutions made to last and to deliver. They don't really care about 10 lines of extra code.
If anything we can improve the readJSON example: https://github.com/uNetworking/uWebSockets.js/blob/81ab2a13fdd46dfb08edc89024989cc5729435e7/examples/JsonPost.js#L35
Currently it uses Node.js's insane Buffer.concat (which is like you say incredibly wasteful). Making changes to this function to better make use of content-length.
You will have exactly the same amount of copies doing this inside uWS or outside as JavaScript. Doing it inside uWS gives, like said 4 times now, zero gain.
But why, if we always append to the end of one buffer, we should not make copies of the previous chunks, the output is zero copies of the chunks, one whole buffer?
No. That's definitely not how it works. The buffer would still be under regular GC. Just like anything written in JavaScript atop.
It's a pity, I thought you were destroying it, by some internal method in V8
But if you really have an issue writing 10 lines
No, the third point is as a bonus, to sell the idea, for JS community this profit )
Buffer.concat (which is like you say incredibly wasteful)
Yes, you can see it in the source code Buffer.concat https://github.com/nodejs/node/blob/master/lib/buffer.js#L562
we should not make copies of the previous chunks
It has already been stated you can do Buffer.allocUnsafe(size)
and Buffer.from(arrayBuffer).copy(allocatedBuffer,start)
have you tried that yet? Just include some abuse prevention so can't send bunch of requests with Content-Length=9999999 and blow up server
we should not make copies of the previous chunks
It has already been stated you can do
Buffer.allocUnsafe(size)
andBuffer.from(arrayBuffer).copy(allocatedBuffer,start)
have you tried that yet? Just include some abuse prevention so can't send bunch of requests with Content-Length=9999999 and blow up server
Yes, where size = req.getHeader('content-length'); And start = whatever you've copied before.
This makes a linear copy while Buffer.concat makes copies in a pyramid (which is just pure dog shit).
Buffer.from(arrayBuffer).copy(allocatedBuffer,start) have you tried that yet?
There is a similar solution in my code, I solved this problem. I am here discussing the need for a new accessible interface in uWS so that everyone has an easy way to get the entire request body without unnecessary copies (duplicates) of bytes
Buffer.from(arrayBuffer).copy(allocatedBuffer,start)
this is not unnecessary copy of bytes, this is what would happen internally from what I understand, data copied to 1 large string, easy enough to code JS function for it
Yes, I said that I have no copies of bytes, because I have a similar solution that you wrote.
The subject of our discussion is whether this needs to be done in the new event handler under the hood in the uWS so that users have no chance of making extra copies
A couple of years ago, the developers of the node added a reasonable way to read a socket from a static buffer. This reduces memory allocations for temporary buffer copies and reduces work GC. https://github.com/nodejs/node/pull/25436
Does this make sense, for read HTTP body? If yes, I suggest adding an onRead handler, making the interface the same as in Node.js https://nodejs.org/api/net.html#net_socket_connect_options_connectlistener