Closed venkatperi closed 7 years ago
I'm not sure what the question/request is here, but you can pass your own highWaterMark
to fs.createReadStream()
, which controls the size of allocated Buffer
pools for fs.ReadStream
.
The issue isn't the size of the buffer (yes, its size can be changed by setting the highWaterMark
) , rather the cost of allocation/release of buffers.
Unfortunately, while this test may yield promising results in an in vitro setting, actual adoption of pooling is much more difficult. It would require, on the consumer's part, to inform the core when the consumer is done with a specific buffer. This not only detracts from the JS paradigm of relying on GC, but can also reduce the overall security of the platform, as refcounting can be very difficult to get right in more complex projects. Plus, from the perspective of compatibility, it would require an opt-in, which diminishes gains in actual deployment.
To demonstrate how easily it is to get refcounting wrong, the post has:
Writable.prototype.write = function ( chunk, encoding, cb ) {
...
if ( !(chunk && typeof chunk._unref === 'function') ) {
return this._unref();
}
...
}
This simply does not work, as many asynchronous streams (including HTTP streams) assume the chunk that's passed into a writable stream will never be changed. There have in fact been issues filed against Node.js (that I am unable to locate right now) about this behavior, but it is resolved that the status quo provides the best performance and is thus kept.
I think this issue was prematurely closed and should be reopened. Node would benefit from more discussion on the subject.
This [...] detracts from the JS paradigm of relying on GC
The question of GC vs static allocation is not a binary decision, and there are ways to design APIs which avoid the problems @TimothyGu describes. Javascript itself does not force anyone to "rely on GC" and Node already provides for static allocation when it comes to file system read()
and write()
operations. If anything, the long-term trend in Node's API design has been more and more towards providing APIs which accept not just buffers, but also offsets into buffers to avoid pressuring the GC through needless allocations.
If some aspects of Node's APIs currently do not support pulling data into a user-provided buffer (which would facilitate static allocation), then perhaps this is more an indication of a problem in the design of the API in question, then with the principle of static allocation.
One can definitely write Javascript to use static allocation for the heavy "data plane" work, and to use GC for the light "control plane" work, and I think Node needs to move in this direction to remain relevant. It doesn't have to be xor.
Encouraging users to rely on GC for data plane work makes no sense from a throughput point of view, and is going to lead to more issues relating to memory fragmentation and allocator choices (see https://github.com/nodejs/node/issues/6078, https://github.com/nodejs/node/issues/12805 and https://github.com/nodejs/node/issues/8871 for some recent examples).
See https://github.com/nodejs/node/pull/6923 for raw numbers to convince.
@jorangreef My objections were based on OP's approach for buffer pulling. I'd be happy if you could file a new issue about pulling into user-controlled buffers (instead of Node.js-owned ones in this issue) – which I agree can be very helpful for throughout when done right – and if you can offer some implementable proposals it would be even better. It would also offer a path to reconcile with Web Streams API's BYOB (bring your own buffer) readers in the future, if that happens.
@jorangreef I just reopened #3403.
I'd be happy if you could file a new issue about pulling into user-controlled buffers (instead of Node.js-owned ones in this issue) ... @jorangreef I just reopened #3403.
Thanks very much @TimothyGu , #3403 is the issue for pulling into user-controlled buffers, appreciate your reopening that.
asynchronous streams (including HTTP streams) assume the chunk that's passed into a writable stream will never be changed
I agree with you on this.
REQUEST: Would you consider moving to a pool scheme since it'll boost performance without changing the API?
fs.ReadStream
allocates64KiB
(default, change viahighWaterMark
) sized Buffer objects as it reads through files which are released downstream when no longer needed. With larger files, this can result in a lot of time spent allocatingBuffer
objects only to have them picked up in GC, which in by itself can add up to a lot.Reducing allocations with a buffer pool should boost performance across the board with respect to file size without foreseeable changes to the public API.
See post for more details.
Performance vs File Size
Higher is better
Percentage Boost in Performance
Average 67% improvement across all sizes