Closed AVVS closed 11 months ago
benchSonicBuf*1000: 595.338ms
benchSonic4kBuf*1000: 590.645ms
benchCoreBuf*1000: 1.073s
There is at least a 2x speed up on my system with this vs core streams when writing buffers, plus proper error handling, so might be worth it, I'll see what the perf without merging small buffers is and whether it's possible to keep the exact performance profile for utf8
Added contentMode
setting. Not sure if the approach I've selected is the best from maintenance standpoint, but benchmarks are ok
benchSonic*1000: 428.597ms
benchSonicSync*1000: 6.054s
benchSonic4k*1000: 408.553ms
benchSonicSync4k*1000: 240.762ms
benchCore*1000: 2.785s
benchConsole*1000: 4.752s
benchSonicBuf*1000: 468.216ms
benchSonicSyncBuf*1000: 6.271s
benchSonic4kBuf*1000: 458.785ms
benchSonicSync4kBuf*1000: 343.907ms
benchCoreBuf*1000: 1.081s
benchSonic*1000: 407.736ms
benchSonicSync*1000: 6.123s
benchSonic4k*1000: 403.239ms
benchSonicSync4k*1000: 236.555ms
benchCore*1000: 2.768s
benchConsole*1000: 5.307s
benchSonicBuf*1000: 455.687ms
benchSonicSyncBuf*1000: 6.225s
benchSonic4kBuf*1000: 454.69ms
benchSonicSync4kBuf*1000: 334.973ms
benchCoreBuf*1000: 1.079s
Master branch perf for comparison, so utf8 mode (default one) seem to be unaffected by changes
benchSonic*1000: 452.573ms
benchSonicSync*1000: 6.064s
benchSonic4k*1000: 429.246ms
benchSonicSync4k*1000: 258.853ms
benchCore*1000: 2.722s
benchConsole*1000: 4.725s
benchSonic*1000: 422.106ms
benchSonicSync*1000: 6.054s
benchSonic4k*1000: 426.696ms
benchSonicSync4k*1000: 253.348ms
benchCore*1000: 2.724s
benchConsole*1000: 5.305s
Only downside I see is that I have copied non-buffer related functions and changed logic there, so if any logic changes in the future then there are multiple places where this needs to be addressed
@mmarchini could you take a look as well?
Sorry, was drowned in GitHub Notifications for months. I'll take a look at it this week.
overall looks fine, I didn't review the tests yet. Main concern as you mentioned is code duplication, so it'll be up to the project to decide if taking the burden of having some logic implemented in two places is worth it for this feature.
As for the performance aspects of this, I can see concat being faster than multiple writes, although that'll be highly dependent on system specs. Can you share specs for the machine where you ran your benchmarks?
Chip Apple M1 Max Memory 64 GB
Updates incoming, in terms of bench
Node 20 performs ~25% better, Node 18 ~ 10%
node 20.3.1
benchSonic*1000: 411.93ms
benchSonicSync*1000: 6.231s
benchSonic4k*1000: 403.094ms
benchSonicSync4k*1000: 231.349ms
benchCore*1000: 1.900s
benchConsole*1000: 6.654s
benchSonicBuf*1000: 479.036ms
benchSonicSyncBuf*1000: 6.643s
benchSonic4kBuf*1000: 479.021ms
benchSonicSync4kBuf*1000: 338.352ms
benchCoreBuf*1000: 647.737ms
benchSonic*1000: 402.104ms
benchSonicSync*1000: 6.381s
benchSonic4k*1000: 411.276ms
benchSonicSync4k*1000: 230.133ms
benchCore*1000: 1.879s
benchConsole*1000: 7.073s
benchSonicBuf*1000: 488.614ms
benchSonicSyncBuf*1000: 6.739s
benchSonic4kBuf*1000: 468.49ms
benchSonicSync4kBuf*1000: 346.683ms
benchCoreBuf*1000: 657.002ms
node 18.16.0
benchSonic*1000: 435.906ms
benchSonicSync*1000: 6.272s
benchSonic4k*1000: 437.36ms
benchSonicSync4k*1000: 252.272ms
benchCore*1000: 2.252s
benchConsole*1000: 4.260s
benchSonicBuf*1000: 799.471ms
benchSonicSyncBuf*1000: 6.771s
benchSonic4kBuf*1000: 796.44ms
benchSonicSync4kBuf*1000: 645.533ms
benchCoreBuf*1000: 893.629ms
benchSonic*1000: 418.545ms
benchSonicSync*1000: 6.380s
benchSonic4k*1000: 417.882ms
benchSonicSync4k*1000: 245.55ms
benchCore*1000: 2.214s
benchConsole*1000: 5.374s
benchSonicBuf*1000: 787.246ms
benchSonicSyncBuf*1000: 6.849s
benchSonic4kBuf*1000: 822ms
benchSonicSync4kBuf*1000: 638.927ms
benchCoreBuf*1000: 888.464ms
@mmarchini please take a look if the changes are sufficient based on your comments
wow, quite a speedup also on writing things compared to core.
Would be interesting to bench this with io_uring support as fs operations should be much faster on linux
Some weirdness is going on with win/node-14 combo during dep install
Some weirdness is going on with win/node-14 combo during dep install
GHA is broken on that platform, just exclude it in the workflow file.
I'll try to take a look over the weekend :)
Would be interesting to bench this with
io_uring
support asfs
operations should be much faster on linux
I have a Linux machine I can try it on. Will report back with the results
anything I should do with this to move it forward? :)
@mmarchini could you take a quick look? I would prefer if this did not break you. Otherwise I'm ok to ship it in the coming weeks.
mashed up quick support for .write(chunk: Buffer), works rather well as long as input is a Buffer, in case it is a utf8 string performance degradation is quite bad. See #173 for why it might be needed
if raw buffer is something that is worth exploring further then I'd be happy to put more effort into having fast paths for both types of encoding, probably by preselecting mode of operation at the time of SonicBoom instantiation