Closed fabiospampinato closed 4 months ago
I spent a few minutes trying to investigate this and here are some numbers.
[39.36ms] crypto
crypto: 85.105ms
crypto: 105.906ms
crypto: 91.476ms
crypto: 90.148ms
About bun, the update
function is handled by CryptoDigest.cpp.
Apparently, bun
uses openssl 1.x, Node.js 16.x also uses that same version, after 16.x Node.js started using openssl 3.x
Instead of EVP_* functions (code), bun
uses SHA1 Functions (code).
I updated the benchmark to:
const crypto = require('node:crypto');
const value = '/Users/fabio/path/to/some/random/file.js';
const hash = crypto.createHash("sha1");
const start = performance.now();
hash.update(value)
console.log(`End: ${performance.now() - start}`);
Bun: 0.0250780000000006ms
Node.js 21.1.0: 0.03838299959897995ms
Node.js 21.1.0 (with %OptimizeFunctionOnNextCall(hash.update)
): 0.03241100162267685ms
So, even calling only update
is slower than bun
, so maybe using SHA1_UPDATE
instead of EVP_DigestUpdate
can help.
But is is even recommendable to use non EVP Calls?
Btw. It is crazy how slow it is to sha256 hmac a string. It is something i cant optimize in probot and results in about 6k req/s performance of a probot server.
But is is even recommendable to use non EVP Calls?
I don't have any idea, in the documentation page, they deprecated some overloads of SHA1
functions but they still support it if you change the arguments to pass context, data, length
.
So, I think is safe to use.
Btw. It is crazy how slow it is to sha256 hmac a string. It is something i cant optimize in probot and results in about 6k req/s performance of a probot server.
Once I cached all the string and then performed only one update
and digest
, this improved the performance significantly.
Apparently,
bun
uses openssl 1.x, Node.js 16.x also uses that same version, after 16.x Node.js started using openssl 3.x
There are various known performance issues related to OpenSSL 3. It's possible that they negatively affect performance of hashing through EVP_*
as well, but someone would have to benchmark the C/C++ code itself to be sure.
Instead of EVP_* functions (code),
bun
uses SHA1 Functions (code).
The EVP_*
functions are recommended over low-level functions etc. The latter are gradually being deprecated and using them might break compatibility with OpenSSL providers/engines.
FWIW, I proposed crypto.hash()
at some point (see https://github.com/nodejs/node/pull/42233), which would be faster than crypto.Hash
for one-shot hashing. It might still be slower than an equivalent implementation using OpenSSL 1.1.1 and low-level OpenSSL APIs.
Are we using OpenSSL 3.0 or 3.1?
@ronag A fork of 3.0 with quic patched. No original OpenSSL 3.0
We are basically cut off from direct OpenSSL updates.
So basically we are stuck with this until we update OpenSSL.
Anyone considered switching to BoringSSL?
I guess a PR to switch to BoringSSL would result in a lot of negative feedback. Also BoringSSL does not guarantee stable ABI.
https://boringssl.googlesource.com/boringssl/+/HEAD/PORTING.md#porting-from-openssl-to-boringssl
Note: BoringSSL does not have a stable API or ABI. It must be updated with its consumers. It is not suitable for, say, a system library in a traditional Linux distribution. For instance, Chromium statically links the specific revision of BoringSSL it was built against. Likewise, Android's system-internal copy of BoringSSL is not exposed by the NDK and must not be used by third-party applications.
E.g. @mcollina mentioned that stable ABI is important for him. So I assume for other it is also important.
E.g. @mcollina mentioned that stable ABI is important for him. So I assume for other it is also important.
Is this a theoretical issue or a practical one? Do they actually break the ABI that often? I think quic support is better in BoringSSL so at the end of the day it might be easier to use?
https://boringssl.googlesource.com/boringssl/+/HEAD/README.md
Although BoringSSL is an open source project, it is not intended for general use, as OpenSSL is. We don't recommend that third parties depend upon it. Doing so is likely to be frustrating because there are no guarantees of API or ABI stability.
BoringSSL also lacks various features that users of Node.js might rely on and that'd we'd have to polyfill somehow, e.g., certain algorithms.
@nodejs/performance @nodejs/crypto
At a glance, this does not look like a hashing performance issue.
Try the following... Start with this script... (call it hash.mjs
)
import crypto from 'node:crypto';
const value = '/Users/fabio/path/to/some/random/file.js';
console.time ( 'crypto' );
for ( let i = 0; i < 1000_000; i++ ) {
crypto.createHash("sha1").update(value).digest("hex");
}
console.timeEnd ( 'crypto' );
(I picked the script as-is, but I extended to loop to 1000_000 repetitions.)
The conjecture is that the hashing is slow, and by that, presumably, we mean the update
call. Ok. Let us trim the code... create a new script that does not update... (call it bhash.msj
)
import crypto from 'node:crypto';
console.time ( 'crypto' );
for ( let i = 0; i < 1000_000; i++ ) {
crypto.createHash("sha1");
}
console.timeEnd ( 'crypto' );
If hashing done by update is the bottleneck, this second script should be very fast compared to the previous one. Let us see... I use Node.js v20.10.0.
$ node hash.mjs
crypto: 1.442s
$ node bhash.mjs
crypto: 1.039s
Yep. It is not hashing. Only 0.4 s are spent on hashing. The bulk of the time is spent on crypto.createHash("sha1")
.
Let us try with Bun now...
$ bun hash.mjs
[750.63ms] crypto
$ bun bhash.mjs
[105.19ms] crypto
So bun spends 0.64 s on hashing... roughly as much as Node.js. The big difference between Node and Bun is that Bun is 10 times faster on the crypto.createHash("sha1")
part.
So let us try something else... let us call crypto.createHash
once:
import crypto from 'node:crypto';
const value = '/Users/fabio/path/to/some/random/file.js';
console.time ( 'crypto' );
const hasher = crypto.createHash("sha1");
for ( let i = 0; i < 1000_000; i++ ) {
hasher.update(value);
}
console.timeEnd ( 'crypto' );
I get...
$ node chash.mjs
crypto: 172.675ms
$ bun chash.mjs
[170.99ms] crypto
Yep. So we have narrowed it down to crypto.createHash
.
So let us profile the bhash.mjs
script with Node:
7.69% node libc.so.6 [.] pthread_rwlock_unlock@@GLIBC_2.34
7.35% node libc.so.6 [.] malloc
5.51% node libc.so.6 [.] pthread_rwlock_rdlock@GLIBC_2.2.5
4.77% node node [.] Builtins_GetProperty
4.65% node libc.so.6 [.] _int_free
4.10% node libc.so.6 [.] _int_malloc
So we are spending time allocating memory and doing some threading. These are expensive jobs...
And now Bun...
5.08% bun bun [.] 0x0000000002e52279
4.26% bun bun [.] 0x0000000002e52897
3.56% bun bun [.] 0x00000000030aabdf
3.55% bun libc.so.6 [.] __memset_evex_unaligned_erms
2.20% bun libc.so.6 [.] __strncasecmp_l_evex
My conclusion is that though the problem might come from OpenSSL, and hashing... it does not appear to be so. As you can see, just Builtins_GetProperty
seems to be a bottleneck.
Maybe we should get the advice of someone who understands deeply the C++ architecture: @joyeecheung
Note: I have not disproved that Bun could have an advantage when hashing. It probably does. I am just saying that on this particular benchmark, the processor I used (Linux+x64), with a small string, hashing is not the bottleneck. Node.js has lost the race before it has even begun.
I did some profiling to see where the bottleneck is - it seems https://github.com/nodejs/performance/issues/136#issuecomment-1836891421 was based on only self-time, but it's still important to look at the call graph.
instanceof
in argument validations in the JS-land Hash
createHash()
becomes 15-17% faster. Note that V8 is also working on conservative stack scanning. AFAIU this makes the handle management closer to what JSC has so we might get speedup of handle management for free without doing something like this, though it's unclear when they'll ship it https://bugs.chromium.org/p/v8/issues/detail?id=13270const { createHash } = require('node:crypto');
const array = [];
for ( let i = 0; i < 1000_000; i++ ) {
array.push(null);
}
const start_time = performance.now();
for ( let i = 0; i < 1000_000; i++ ) {
array[i] = createHash("sha1");
}
console.log(performance.now() - start_time);
console.log(array.length);
On main the profile of the snippet above looks like this (simplified with only the important bits, number is total time in the process being profiled, the % outside these are mostly setup/teardown/GC stuff, or just samples with a slightly different stack)
--48.93%--node::crypto::Hash::New
--22.40%--node::BaseObject::BaseObject
--16.25%--node::crypto::Hash::HashInit
--15.03%--evp_md_init_internal
--5.34%--EVP_get_digestbyname
--3.25%--node::Utf8Value::Utf8Value
--4.96%--v8::internal::ApiNatives::InstantiateObject
--3.55%--v8::internal::Heap::CollectGarbage
--7.65%--Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit
--5.46%--v8::internal::Runtime_AllocateInYoungGeneration
--3.82%--Builtins_InstanceOf
With the Oilpan prototype in https://github.com/nodejs/node/pull/51017 I get a somewhat different profile that looks like this
--40.03%--node::crypto::Hash::New
--21.29%--node::crypto::Hash::HashInit
--18.93%--evp_md_init_internal
--7.04%--EVP_get_digestbyname
--3.29%--node::SetCppgcReference
--3.25%--node::Utf8Value::Utf8Value
--8.54%--v8::internal::ApiNatives::InstantiateObject
--6.49%--v8::internal::Heap::CollectGarbage
--6.83%--Builtins_InstanceOf
--4.40%--Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit
--4.23%--v8::internal::Runtime_AllocateInYoungGeneration
With a different allocation scheme it sheds off a chunk of the overhead though there are still a lot to be improved:
$./node_main run.js
1642.1952390000001
1000000
$ out/Release/node run.js
1276.719182
1000000
$ bun run.js
233.12367799999998
1000000
I think another thing that can be done is caching the EVP_MD
of known algorithms (not 100% sure if that's safe to do, maybe people from @nodejs/crypto knows for sure). And maybe some JS voodoo can be done to create a fast path to avoid the instanceof overhead in argument validation.
Actually regarding the OpenSSL overhead the solution (at least a partial solution) seems surprisingly simple - we are just under-initializing OpenSSL: https://github.com/nodejs/node/pull/51019
Actually https://github.com/nodejs/performance/issues/136#issuecomment-1837269580 was incorrect, I was just looking at the numbers from the Oilpan prototype, initializing it with OPENSSL_INIT_ADD_ALL_DIGESTS
won't make a differenct because EVP_get_digestbyname
would also do that.
On another note, it may be a good idea to just move away from EVP_get_digestbyname
if we are on OpenSSL 3. Locally switching to EVP_MD_fetch
(but without proper lifecycle management so it leaks) can speed up createHash()
by another 5-7% - though handling the life cycle of that ref-counted EVP_MD in the existing implementation can be somewhat tricky.
confidence improvement accuracy (*) (**) (***)
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=10 sync='createHash' *** 5.65 % ±1.21% ±1.62% ±2.11%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=100 sync='createHash' *** 6.37 % ±0.84% ±1.12% ±1.45%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=20 sync='createHash' *** 5.71 % ±1.10% ±1.47% ±1.91%
crypto/webcrypto-digest.js n=100000 method='SHA-1' data=50 sync='createHash' *** 7.57 % ±1.50% ±2.00% ±2.62%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=10 sync='createHash' *** 6.56 % ±1.15% ±1.52% ±1.98%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=100 sync='createHash' *** 5.65 % ±0.80% ±1.06% ±1.38%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=20 sync='createHash' *** 6.73 % ±0.98% ±1.31% ±1.70%
crypto/webcrypto-digest.js n=100000 method='SHA-256' data=50 sync='createHash' *** 6.05 % ±0.89% ±1.18% ±1.54%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=10 sync='createHash' *** 6.86 % ±1.15% ±1.53% ±2.00%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=100 sync='createHash' *** 7.02 % ±0.98% ±1.31% ±1.71%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=20 sync='createHash' *** 7.28 % ±0.92% ±1.22% ±1.58%
crypto/webcrypto-digest.js n=100000 method='SHA-384' data=50 sync='createHash' *** 6.52 % ±1.12% ±1.50% ±1.95%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=10 sync='createHash' *** 5.86 % ±1.11% ±1.48% ±1.92%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=100 sync='createHash' *** 5.40 % ±1.16% ±1.54% ±2.01%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=20 sync='createHash' *** 6.29 % ±1.07% ±1.43% ±1.87%
crypto/webcrypto-digest.js n=100000 method='SHA-512' data=50 sync='createHash' *** 5.59 % ±1.17% ±1.56% ±2.03%
Doesnt the OpenSSL docs say something about avoiding the lookup functions like EVP_get_digestbyname? I think I read something about this. Something along the lines that OpenSSL 3 is totally object oriented and thus e.g. you register digests in an Object at runtime. So calling something like EVP_get_digestbyname means basically, that OpenSSL goes through each name and checks if it finds the proper digest.
Yes, though I think we'll need a bit of internal refactoring to move on to explicit fetching because so far most of the crypto classes don't think about freeing the EVP_MD (there is also a somewhat weird (undocumented?) signature that allows the algorithm to be another Hash object, I am not sure what should happen in terms of life cycle management in that case)
About the InstanceOf - it seems mostly coming from the this instanceof Hash
check which is pre-ES6 way of checking construct calls. The post-ES6 way is new.target
which can be much faster (in V8 that can just be just checking a register) - with https://github.com/nodejs/node/pull/51026 Builtins_InstanceOf
no longer shows up in the profile.
On a side note, I see that there are still a lot of this instanceof
scattering around in the code base. Probably can be dusted away too.
I had another "surprisingly simple" (hopefully not just being naive) idea of migrating to EVP_MD_fetch and caching the EVP_MD. With https://github.com/nodejs/node/pull/51034 https://github.com/nodejs/node/pull/51026 and https://github.com/nodejs/node/pull/51017 I am getting this from the createHash() microbenchmark pasted above:
$ out/Release/node run.js
790.332561
1000000
$ ./node_main run.js
1634.7148000000002
1000000
At this point the most significant bottleneck is garbage collection (specifically scavenge) from createHash()
and this has more to do with how the benchmark is written. If I update the snippet to generate a digest and put it into the array, the other bottleneck is the string decoding part that converts input from a string to a buffer.
With a microbenchmark like this:
'use strict';
const { createHash } = require('node:crypto');
const { readFileSync } = require('node:fs');
const kHashCount = parseInt(process.env.HASH_COUNT) || 10000;
const kInputLength = parseInt(process.env.INPUT_LENGTH) || 100;
const kUpdateCount = parseInt(process.env.UPDATE_COUNT) || 1;
const array = [];
for ( let i = 0; i < kHashCount; i++ ) {
array.push(null);
}
const input = readFileSync('./test/fixtures/snapshot/typescript.js').slice(0, kInputLength);
const start_time = performance.now();
for ( let i = 0; i < kHashCount; i++ ) {
const hash = createHash('sha1');
for (let j = 0; j < kUpdateCount; j++) {
hash.update(input);
}
array[i] = hash.digest('hex');
}
console.log(performance.now() - start_time);
console.log(`${kHashCount} hashes, ${kUpdateCount} updates, input length ${input.length}, ${array.length} results`);
Using a buffer as input, there are only performance differences between my local branch, the main branch, and bun when we create a lot of hashes - the bottleneck is the performance of createHash() as well as GC. When hashing a big chunk of input at one go, or hashing fewer streamed bigger input, they perform similarly.
$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 out/Release/node hash.js
1268.3055590000001
1000000 hashes, 1 updates, input length 100, 1000000 results
$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 ./node_main hash.js
2013.5803560000002
1000000 hashes, 1 updates, input length 100, 1000000 results
$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 bun hash.js
713.9530219999999
1000000 hashes, 1 updates, input length 100, 1000000 results
$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 out/Release/node hash.js
960.9329419999999
1000 hashes, 100 updates, input length 10000, 1000 results
$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 ./node_main hash.js
968.0744119999999
1000 hashes, 100 updates, input length 10000, 1000 results
$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 bun hash.js
970.135489
1000 hashes, 100 updates, input length 10000, 1000 results
$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 out/Release/node hash.js
9.995244000000003
1 hashes, 1 updates, input length 10000000, 1 results
$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 ./node_main hash.js
9.976660999999996
1 hashes, 1 updates, input length 10000000, 1 results
$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 bun hash.js
9.789406
1 hashes, 1 updates, input length 10000000, 1 results
And if we add a .toString()
to the input before slicing it, the local branch performs a 1.2-1.6x worse than bun, here the string decoding performance becomes another the bottleneck
$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 out/Release/node hash.js
1306.387378
1000000 hashes, 1 updates, input length 100, 1000000 results
$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 ./node_main hash.js
2217.123724
1000000 hashes, 1 updates, input length 100, 1000000 results
$ HASH_COUNT=1000000 INPUT_LENGTH=100 UPDATE_COUNT=1 bun hash.js
885.439071
1000000 hashes, 1 updates, input length 100, 1000000 results
$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 out/Release/node hash.js
1819.652489
1000 hashes, 100 updates, input length 10000, 1000 results
$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 ./node_main hash.js
1804.612928
1000 hashes, 100 updates, input length 10000, 1000 results
$ HASH_COUNT=1000 INPUT_LENGTH=10000 UPDATE_COUNT=100 bun hash.js
1183.056887
1000 hashes, 100 updates, input length 10000, 1000 results
$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 out/Release/node hash.js
24.548640999999996
1 hashes, 1 updates, input length 10000000, 1 results
$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 ./node_main hash.js
24.163148999999997
1 hashes, 1 updates, input length 10000000, 1 results
$ HASH_COUNT=1 INPUT_LENGTH=10000000 UPDATE_COUNT=1 bun hash.js
20.361431000000003
1 hashes, 1 updates, input length 10000000, 1 results
EVP_Q_digest
) for generating digests at one go and not creating intermediate objects. Setting performance aside that helper might be useful enough to be worth it on its own anyway.string->WriteUtf8()
and others arent' quite performant)the other bottleneck is the string decoding part that converts input from a string to a buffer.
As I understand it V8 keeps track of whether a string is encoded in latin1 or not, because if it's using latin1 it can just allocate 1 byte per character rather than 2. In that case, for latin1 strings, if the rope is flattened and whatever else may be needed, shouldn't decoding be essentially free because the slice of memory that the string points to can just be hashed directly? Maybe it's more complicated than that 🤔
In that case, for latin1 strings, if the rope is flattened and whatever else may be needed, shouldn't decoding be essentially free because the slice of memory that the string points to can just be hashed directly?
V8 does not hand over the raw bytes to embedders, only a copy. Also it's only a pure copy for ASCII-only strings. For strings that are Latin1 but not ASCII there's still a bit of conversion to split each code point into two bytes.
@joyeecheung
If you effectively made Node 2x faster on this benchmark… it feels like it is quite a success… it is not enough to beat Bun but it is enough so that it is no longer embarrassing for Node. Especially if you achieved that with just a few lines of code!
I think "making Node 2x" faster isn't a very accurate description of the overall situation...or that Node.js being 3x slower than Bun already is a limited description of the situation - the 3x slowness is conditioned to:
Once the input is not a string but a buffer, or the input is streamed, or there aren't as many as 100K inputs being hashed in the tight loop, or the input is longer, the order of slowness decreases. For example if the workload is just hashing 1000 buffers with 100 10000 chunks, then Node.js and Bun can already finish the workload in about the same time - with or without my changes. If they are 1000 strings in 100 10000 chunks, then Node.js is about 1.8x slower - with or without my changes.
And then, after getting rid of what can be done on the Node.js side, the primary bottleneck we found are GC (when hashing a lot of small input in a tight loop) and string -> buffer conversion (when hashing strings streamed in many chunks). These are mostly V8 v.s. JSC differences, not Node.js v.s Bun per-se.
@joyeecheung Point taken. My comment was in the scope of the microbenchmark that motivates the issue. I did not mean to imply that you made Node 2x faster in general.
The string-to-buffer conversion issue is annoying.
I tried out an implementation that computes a digest at one shot which looks something like crypto.digest('sha1', input, 'hex')
at https://github.com/joyeecheung/node/tree/oneshot-digest to prove my theory about GC performance being the bottleneck - and I think that's correct. Without the intermediate objects and copies, the one-shot implementation is about 3x faster in the case of "hashing 100K short strings (100 bytes)" than the object-based implementation.
'use strict';
const { digest, createHash } = require('node:crypto');
const { readFileSync } = require('node:fs');
const kHashCount = parseInt(process.env.HASH_COUNT) || 100000;
const kInputLength = parseInt(process.env.INPUT_LENGTH) || 1000;
const array = [];
for ( let i = 0; i < kHashCount; i++ ) {
array.push(null);
}
const input = readFileSync('./test/fixtures/snapshot/typescript.js').toString().slice(0, kInputLength);
const sha1 = digest ?
(input) => digest('sha1', input, 'hex') :
(input) => createHash('sha1').update(input).digest('hex');
const start_time = performance.now();
for ( let i = 0; i < kHashCount; i++ ) {
array[i] = sha1(input);
}
console.log(performance.now() - start_time);
console.log(`${kHashCount} hashes, input length ${input.length}, ${array.length} results`);
$ HASH_COUNT=1000000 INPUT_LENGTH=100 ./node_main hash2.js
2307.756804
1000000 hashes, input length 100, 1000000 results
$ HASH_COUNT=1000000 INPUT_LENGTH=100 out/Release/node hash2.js
716.874251
1000000 hashes, input length 100, 1000000 results
And in this profile, string encoding/decoding performance is the primary bottleneck:
--21.28%--node::Utf8Value::Utf8Value
--14.45%--v8::String::WriteUtf8
--15.36%--EVP_Digest
--10.80%--node::StringBytes::Encode
--4.93%--v8::internal::Factory::AllocateRaw
--4.32%--v8::internal::Heap::CollectGarbage
It looks like somebody had a similar idea at some point.
It'd be cool if kinda common APIs like that could be made 3x faster. Having both a hash
and a createHash
function seems a little weird though 🤔
Oh yeah I was thinking "there must be some feature request for this..or there must have been an Open PR for this.." though I wasn't able to find https://github.com/nodejs/node/pull/42233 - that PR would be still be quite a bit slower than the branch I have though because it's missing some pieces (e.g. caching EVP_MD*, using explicit fetching, overhead from the HashJob implementation which is still object-based).
It'd be cool if kinda common APIs like that could be made 3x faster. Having both a hash and a createHash function seems a little weird though
I think you could maybe do it with a lazy/on-demand approach. So right now you compute... crypto.createHash("sha1").update(value).digest("hex")
by eagerly doing all the work as soon as you can. This means that you need to have a non-trivial Hash object and do JavaScript-to-C++ calls repeatedly.
Instead, you could have a lightweight JavaScript object that simply collects the information without doing anything. Then when the has value is required, it computes it. In fact, if @joyeecheung implemented createHash
, then an on-demand approach like what I describe could be just a few lines of code away, at least in prototypical form.
Instead, you could have a lightweight JavaScript object that simply collects the information without doing anything.
I assume this is talking about buffering in JS land (possibly with a JS array) - the overhead of iterating over that array might already be another bottleneck until we get the new Array iteration API in https://github.com/nodejs/node/pull/50115. It would also increase the memory overhead when there are more updates/larger inputs that won't be otherwise kept alive. For a non-streaming use case, a one-shot digest implementation would still fare better, or users can do the buffering themselves and pass the concatenated buffers into a one-shot digest implementation if for their use case the memory overhead of buffering the original data matter less than hashing performance. (Although as can be seen in https://github.com/nodejs/performance/issues/136#issuecomment-1837588252, when there are more updates in a streaming use case, there performance of Node.js and Bun are already closer to each other).
Having both a hash and a createHash function seems a little weird though 🤔
The crypto interface is largely just mapping onto OpenSSL APIs, so I think a one-shot digest helper + a streaming API would still make sense as there are also similar interface parities in OpenSSL (EVP_Digest
v.s. EVP_DigestInit
/EVP_DigestUpdate
/EVP_DigestFinal_*
). They serve different purposes - a streaming API would fair better in an asynchronous application and has lower memory overhead, whereas the one-shot digest is faster if the use case is just computing a digest of a piece of immediately available data synchronously.
I assume this is talking about buffering in JS land (possibly with a JS array)
What I am saying is that crypto.createHash("sha1")
only needs an object that contains "sha1"
. It does not need to do anything, at least nothing in C++ land right away. I am just alluding to the design space you have.
I do agree with you, btw, that it makes a lot of sense to do a one-shot hashing... But what I am also saying is that crypto.createHash("sha1").update(buffer).digest(...)
should not have wildly different performance characteristics.
Yeah, there could be some micro-optimization that only starts a object-based implementation upon second update / when the first update contains a big enough input (say >1MB). Otherwise fallback to a one-shot digest implementation. Though the prerequisite would still be having a one-shot digest implementation...
Yeah, there could be some micro-optimization that only starts a object-based implementation upon second update / when the first update contains a big enough input (say >1MB). Otherwise fallback to a one-shot digest implementation. Though the prerequisite would still be having a one-shot digest implementation..
Precisely. :heart: :heart: The idea is that if you propose a new function, not everyone will rush to adopt it. But it might be possible to provide similar benefits to people who use the conventional API. :dash:
@joyeecheung If you want a real world example where createHash
is called a lot then Yarn might provide that.
Setup:
git clone https://github.com/yarnpkg/berry
cd berry
git checkout db6210f48355d2986e965f90009b22f18d3b6342
yarn build:cli --no-minify
Running the following command will call createHash
75068 times:
rm -f .yarn/install-state.gz
YARN_IGNORE_PATH=1 CI=true node ./packages/yarnpkg-cli/bundles/yarn.js
and the following will, at the time of writing, call it 90340 times:
rm -f yarn.lock .yarn/install-state.gz
YARN_IGNORE_PATH=1 node ./packages/yarnpkg-cli/bundles/yarn.js
about 1853 of those should be hashing streamed files and the rest strings.
I looked into the "buffering input" idea for crypto.createHash
and it turns out to be more complicated than I thought - most importantly, errors would be delayed until the second update()
or digest()
is called (e.g. if you pass an algorithm that the OpenSSL version loaded in to the process doesn't support, you can't get an error immediately, because we don't eagerly cache all the supported algorithms to avoid overhead for the first access, so you'd have to wait until the actual hashing call happens and OpenSSL tells us that the algorithm is not supported). Added to it the fact that Hash objects are also streams, it could be difficult to get the error handling correct, So I'll just go with https://github.com/nodejs/node/pull/51044 for now.
Since https://github.com/nodejs/node/pull/51044 was merged, I think we can close this issue.
Any other discussion about OpenSSL and its perf, I think we should keep at https://github.com/nodejs/performance/issues/72
What is the problem this feature will solve?
Making the hash functions significantly faster.
What is the feature you are proposing to solve the problem?
I'm not sure what the best option for this is, but running the following file:
I see the following output:
Basically Node's sha1 function seems at least 3x slower than Bun's.
Hashing is at the core of many important things, so I'd argue it's important to hash as fast as possible since that would speed up a cascade of use cases.
I'm not sure what the best solution is here, but if Bun is 3x faster than Node here presumably there's a lot of room for improvement.
What alternatives have you considered?
No response