paulmillr / noble-hashes

Audited & minimal JS implementation of hash functions, MACs and KDFs.
https://paulmillr.com/noble
MIT License
545 stars 46 forks source link

Wildcard imports for `_assert` and `_u64` modules #66

Closed jeetiss closed 1 year ago

jeetiss commented 1 year ago

migrated _assert and _u64 module to wildcard imports to enable tree-shaking

this is part of #63

paulmillr commented 1 year ago

This was done specifically for performance, as far as I remember.

Could you run before-after benchmarks on your machine?

jeetiss commented 1 year ago

benchs:

wildcard imports ``` ------- Benchmarking SHA256 32B x 989,119 ops/sec @ 1μs/op ± 3.30% (min: 791ns, max: 7ms) SHA256 64B x 677,506 ops/sec @ 1μs/op ± 2.82% (min: 1μs, max: 1ms) SHA256 1KB x 145,243 ops/sec @ 6μs/op ± 1.24% (min: 6μs, max: 1ms) SHA384 32B x 458,295 ops/sec @ 2μs/op SHA384 64B x 465,766 ops/sec @ 2μs/op ± 1.20% (min: 1μs, max: 882μs) SHA384 1KB x 71,710 ops/sec @ 13μs/op SHA512 32B x 466,417 ops/sec @ 2μs/op ± 1.74% (min: 1μs, max: 8ms) SHA512 64B x 480,076 ops/sec @ 2μs/op ± 1.14% (min: 1μs, max: 845μs) SHA512 1KB x 75,193 ops/sec @ 13μs/op SHA3-256, keccak256, shake256 32B x 182,116 ops/sec @ 5μs/op SHA3-256, keccak256, shake256 64B x 181,653 ops/sec @ 5μs/op SHA3-256, keccak256, shake256 1KB x 23,827 ops/sec @ 41μs/op Kangaroo12 32B x 301,568 ops/sec @ 3μs/op Kangaroo12 64B x 299,401 ops/sec @ 3μs/op Kangaroo12 1KB x 50,515 ops/sec @ 19μs/op Marsupilami14 32B x 268,240 ops/sec @ 3μs/op Marsupilami14 64B x 266,951 ops/sec @ 3μs/op Marsupilami14 1KB x 38,930 ops/sec @ 25μs/op BLAKE2b 32B x 298,685 ops/sec @ 3μs/op BLAKE2b 64B x 299,850 ops/sec @ 3μs/op BLAKE2b 1KB x 43,698 ops/sec @ 22μs/op BLAKE2s 32B x 492,368 ops/sec @ 2μs/op BLAKE2s 64B x 496,031 ops/sec @ 2μs/op BLAKE2s 1KB x 41,181 ops/sec @ 24μs/op BLAKE3 32B x 567,859 ops/sec @ 1μs/op BLAKE3 64B x 573,065 ops/sec @ 1μs/op ± 1.12% (min: 1μs, max: 395μs) BLAKE3 1KB x 58,213 ops/sec @ 17μs/op RIPEMD160 32B x 1,066,098 ops/sec @ 938ns/op ± 1.99% (min: 750ns, max: 1ms) RIPEMD160 64B x 745,712 ops/sec @ 1μs/op ± 2.72% (min: 1μs, max: 1ms) RIPEMD160 1KB x 161,160 ops/sec @ 6μs/op ± 1.01% (min: 6μs, max: 1ms) HMAC-SHA256 32B x 240,096 ops/sec @ 4μs/op HMAC-SHA256 64B x 218,150 ops/sec @ 4μs/op HMAC-SHA256 1KB x 59,347 ops/sec @ 16μs/op RAM: rss=172.6mb heap=105.7mb used=78.0mb ------- Benchmarking HKDF-SHA256 32 x 108,920 ops/sec @ 9μs/op HKDF-SHA256 64 x 94,723 ops/sec @ 10μs/op HKDF-SHA256 256 x 50,403 ops/sec @ 19μs/op PBKDF2-HMAC-SHA256 16384 x 41 ops/sec @ 24ms/op ± 10.75% (min: 21ms, max: 31ms) PBKDF2-HMAC-SHA256 65536 x 11 ops/sec @ 86ms/op PBKDF2-HMAC-SHA256 262144 x 2 ops/sec @ 347ms/op PBKDF2-HMAC-SHA512 16384 x 16 ops/sec @ 59ms/op ± 3.56% (min: 58ms, max: 67ms) PBKDF2-HMAC-SHA512 65536 x 4 ops/sec @ 233ms/op PBKDF2-HMAC-SHA512 262144 x 1 ops/sec @ 934ms/op Scrypt r: 8, p: 1, n: 16384 x 25 ops/sec @ 38ms/op ± 8.90% (min: 34ms, max: 48ms) Scrypt r: 8, p: 1, n: 65536 x 6 ops/sec @ 150ms/op ± 2.68% (min: 147ms, max: 156ms) Scrypt r: 8, p: 1, n: 262144 x 1 ops/sec @ 616ms/op ± 1.35% (min: 605ms, max: 625ms) Scrypt Async r: 8, p: 1, n: 16384 x 26 ops/sec @ 37ms/op ± 3.74% (min: 36ms, max: 41ms) Scrypt Async r: 8, p: 1, n: 65536 x 6 ops/sec @ 158ms/op ± 1.23% (min: 157ms, max: 161ms) Scrypt Async r: 8, p: 1, n: 262144 x 1 ops/sec @ 655ms/op ± 1.04% (min: 639ms, max: 663ms) RAM: rss=529.7mb heap=11.8mb used=8.5mb arr=268.5mb ```
main branch ``` ------- Benchmarking SHA256 32B x 1,008,064 ops/sec @ 992ns/op ± 3.14% (min: 791ns, max: 7ms) SHA256 64B x 726,744 ops/sec @ 1μs/op ± 1.88% (min: 1μs, max: 1ms) SHA256 1KB x 146,477 ops/sec @ 6μs/op SHA384 32B x 450,856 ops/sec @ 2μs/op SHA384 64B x 452,898 ops/sec @ 2μs/op ± 1.38% (min: 1μs, max: 917μs) SHA384 1KB x 70,368 ops/sec @ 14μs/op SHA512 32B x 445,434 ops/sec @ 2μs/op ± 1.80% (min: 1μs, max: 9ms) SHA512 64B x 461,467 ops/sec @ 2μs/op ± 1.06% (min: 1μs, max: 708μs) SHA512 1KB x 70,977 ops/sec @ 14μs/op SHA3-256, keccak256, shake256 32B x 190,222 ops/sec @ 5μs/op SHA3-256, keccak256, shake256 64B x 189,501 ops/sec @ 5μs/op SHA3-256, keccak256, shake256 1KB x 24,854 ops/sec @ 40μs/op Kangaroo12 32B x 313,087 ops/sec @ 3μs/op Kangaroo12 64B x 307,314 ops/sec @ 3μs/op ± 1.48% (min: 2μs, max: 3ms) Kangaroo12 1KB x 52,345 ops/sec @ 19μs/op Marsupilami14 32B x 278,862 ops/sec @ 3μs/op Marsupilami14 64B x 276,701 ops/sec @ 3μs/op Marsupilami14 1KB x 40,525 ops/sec @ 24μs/op BLAKE2b 32B x 354,609 ops/sec @ 2μs/op BLAKE2b 64B x 356,760 ops/sec @ 2μs/op BLAKE2b 1KB x 53,760 ops/sec @ 18μs/op BLAKE2s 32B x 511,247 ops/sec @ 1μs/op BLAKE2s 64B x 516,795 ops/sec @ 1μs/op ± 1.12% (min: 1μs, max: 326μs) BLAKE2s 1KB x 42,822 ops/sec @ 23μs/op BLAKE3 32B x 555,864 ops/sec @ 1μs/op ± 2.06% (min: 1μs, max: 8ms) BLAKE3 64B x 571,755 ops/sec @ 1μs/op ± 1.00% (min: 1μs, max: 341μs) BLAKE3 1KB x 54,424 ops/sec @ 18μs/op RIPEMD160 32B x 1,058,201 ops/sec @ 945ns/op ± 1.97% (min: 750ns, max: 1ms) RIPEMD160 64B x 773,395 ops/sec @ 1μs/op ± 2.10% (min: 1μs, max: 1ms) RIPEMD160 1KB x 159,897 ops/sec @ 6μs/op HMAC-SHA256 32B x 240,963 ops/sec @ 4μs/op HMAC-SHA256 64B x 218,914 ops/sec @ 4μs/op HMAC-SHA256 1KB x 59,248 ops/sec @ 16μs/op RAM: rss=159.3mb heap=89.6mb used=64.9mb ------- Benchmarking HKDF-SHA256 32 x 110,253 ops/sec @ 9μs/op HKDF-SHA256 64 x 95,365 ops/sec @ 10μs/op HKDF-SHA256 256 x 50,676 ops/sec @ 19μs/op PBKDF2-HMAC-SHA256 16384 x 41 ops/sec @ 23ms/op ± 10.26% (min: 21ms, max: 30ms) PBKDF2-HMAC-SHA256 65536 x 11 ops/sec @ 86ms/op PBKDF2-HMAC-SHA256 262144 x 2 ops/sec @ 346ms/op PBKDF2-HMAC-SHA512 16384 x 15 ops/sec @ 62ms/op ± 3.24% (min: 61ms, max: 70ms) PBKDF2-HMAC-SHA512 65536 x 4 ops/sec @ 246ms/op PBKDF2-HMAC-SHA512 262144 x 1 ops/sec @ 990ms/op Scrypt r: 8, p: 1, n: 16384 x 26 ops/sec @ 38ms/op ± 9.48% (min: 34ms, max: 50ms) Scrypt r: 8, p: 1, n: 65536 x 6 ops/sec @ 151ms/op ± 1.86% (min: 148ms, max: 154ms) Scrypt r: 8, p: 1, n: 262144 x 1 ops/sec @ 632ms/op Scrypt Async r: 8, p: 1, n: 16384 x 25 ops/sec @ 39ms/op ± 3.98% (min: 37ms, max: 44ms) Scrypt Async r: 8, p: 1, n: 65536 x 6 ops/sec @ 161ms/op ± 2.04% (min: 158ms, max: 165ms) Scrypt Async r: 8, p: 1, n: 262144 x 1 ops/sec @ 666ms/op RAM: rss=550.9mb heap=11.8mb used=8.6mb arr=268.5mb ```
paulmillr commented 1 year ago

seems like a big slowdown for some hashes, so -1 from me

jeetiss commented 1 year ago

I dug into the micro-optimizations and found the reason why direct imports are faster than wildcard:

import * as operations from './utils.js';
operations.add(1, 2);

import { add } from './utils.js';
add(1, 2);

compiled into

"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
const operations = require("./utils.js");
operations.add(1, 2);

const utils_js_1 = require("./utils.js");
(0, utils_js_1.add)(1, 2);

The second property accessor is ~20-30% faster than the one with wildcard import because of (0, utils_js_1.add) hack. I created an issue with typescript, but I think it is better to migrate _assert and _u64 modules to direct imports to get benefits from both:

Are you agree with my research and conclusion?

paulmillr commented 1 year ago

I think it is better to migrate _assert and _u64 modules to direct imports

I don't understand. Do you suggest we merge the pull request even though it's 30% slower? To gain 100-byte size optimization over the modules?

jeetiss commented 1 year ago

I suggest migrating from import u64 from './_u64.js' to import { add, add3H, add3L, fromBig, rotr32H, rotr32L, rotrBH, rotrBL, rotrSH, rotrSL } from './_u64.js' because it will gain 100-byte size optimization and should have the same performance

paulmillr commented 1 year ago

ok.

i've checked this - leave blake2b the current default import way. Other files seem to be ok with switch to { ... specific imports.

jeetiss commented 1 year ago

updated bench

wildcard imports ``` ------- Benchmarking SHA256 32B x 1,008,064 ops/sec @ 992ns/op ± 3.19% (min: 750ns, max: 7ms) SHA384 32B x 456,621 ops/sec @ 2μs/op SHA512 32B x 453,309 ops/sec @ 2μs/op SHA3-256, keccak256, shake256 32B x 181,620 ops/sec @ 5μs/op Kangaroo12 32B x 300,120 ops/sec @ 3μs/op Marsupilami14 32B x 267,165 ops/sec @ 3μs/op BLAKE2b 32B x 351,988 ops/sec @ 2μs/op BLAKE2s 32B x 496,524 ops/sec @ 2μs/op ± 1.60% (min: 1μs, max: 7ms) BLAKE3 32B x 567,536 ops/sec @ 1μs/op RIPEMD160 32B x 1,066,098 ops/sec @ 938ns/op ± 1.95% (min: 750ns, max: 1ms) HMAC-SHA256 32B x 237,925 ops/sec @ 4μs/op RAM: rss=159.2mb heap=90.3mb used=56.9mb ------- Benchmarking HKDF-SHA256 32 x 107,712 ops/sec @ 9μs/op HKDF-SHA256 64 x 92,558 ops/sec @ 10μs/op HKDF-SHA256 256 x 49,446 ops/sec @ 20μs/op PBKDF2-HMAC-SHA256 16384 x 46 ops/sec @ 21ms/op PBKDF2-HMAC-SHA256 65536 x 11 ops/sec @ 87ms/op PBKDF2-HMAC-SHA256 262144 x 2 ops/sec @ 349ms/op PBKDF2-HMAC-SHA512 16384 x 16 ops/sec @ 62ms/op PBKDF2-HMAC-SHA512 65536 x 4 ops/sec @ 248ms/op PBKDF2-HMAC-SHA512 262144 x 1 ops/sec @ 994ms/op Scrypt r: 8, p: 1, n: 16384 x 27 ops/sec @ 35ms/op ± 2.47% (min: 35ms, max: 38ms) Scrypt r: 8, p: 1, n: 65536 x 6 ops/sec @ 148ms/op Scrypt r: 8, p: 1, n: 262144 x 1 ops/sec @ 629ms/op ± 1.02% (min: 620ms, max: 635ms) Scrypt Async r: 8, p: 1, n: 16384 x 26 ops/sec @ 37ms/op Scrypt Async r: 8, p: 1, n: 65536 x 6 ops/sec @ 162ms/op Scrypt Async r: 8, p: 1, n: 262144 x 1 ops/sec @ 673ms/op ± 1.13% (min: 658ms, max: 686ms) RAM: rss=553.0mb heap=11.8mb used=8.5mb arr=268.5mb ```
main branch ``` ------- Benchmarking SHA256 32B x 1,015,228 ops/sec @ 985ns/op ± 3.16% (min: 750ns, max: 7ms) SHA384 32B x 458,926 ops/sec @ 2μs/op SHA512 32B x 456,829 ops/sec @ 2μs/op SHA3-256, keccak256, shake256 32B x 190,548 ops/sec @ 5μs/op Kangaroo12 32B x 312,304 ops/sec @ 3μs/op Marsupilami14 32B x 277,777 ops/sec @ 3μs/op ± 1.15% (min: 3μs, max: 9ms) BLAKE2b 32B x 356,125 ops/sec @ 2μs/op BLAKE2s 32B x 512,295 ops/sec @ 1μs/op BLAKE3 32B x 564,652 ops/sec @ 1μs/op RIPEMD160 32B x 1,079,913 ops/sec @ 926ns/op ± 1.93% (min: 750ns, max: 1ms) HMAC-SHA256 32B x 240,269 ops/sec @ 4μs/op RAM: rss=133.6mb heap=64.4mb used=37.1mb ------- Benchmarking HKDF-SHA256 32 x 109,445 ops/sec @ 9μs/op HKDF-SHA256 64 x 94,580 ops/sec @ 10μs/op HKDF-SHA256 256 x 49,414 ops/sec @ 20μs/op PBKDF2-HMAC-SHA256 16384 x 45 ops/sec @ 21ms/op PBKDF2-HMAC-SHA256 65536 x 11 ops/sec @ 87ms/op ± 1.01% (min: 86ms, max: 88ms) PBKDF2-HMAC-SHA256 262144 x 2 ops/sec @ 346ms/op PBKDF2-HMAC-SHA512 16384 x 16 ops/sec @ 61ms/op PBKDF2-HMAC-SHA512 65536 x 4 ops/sec @ 244ms/op PBKDF2-HMAC-SHA512 262144 x 1 ops/sec @ 985ms/op Scrypt r: 8, p: 1, n: 16384 x 26 ops/sec @ 38ms/op ± 2.12% (min: 35ms, max: 41ms) Scrypt r: 8, p: 1, n: 65536 x 6 ops/sec @ 147ms/op Scrypt r: 8, p: 1, n: 262144 x 1 ops/sec @ 625ms/op Scrypt Async r: 8, p: 1, n: 16384 x 26 ops/sec @ 37ms/op Scrypt Async r: 8, p: 1, n: 65536 x 6 ops/sec @ 161ms/op ± 2.36% (min: 157ms, max: 167ms) Scrypt Async r: 8, p: 1, n: 262144 x 1 ops/sec @ 658ms/op ± 2.04% (min: 644ms, max: 674ms) RAM: rss=533.6mb heap=11.8mb used=8.6mb arr=268.5mb ```