Should be possible to use `crypto.randomBytes`'s async API

brendanlong commented 8 years ago

Running my program with --trace-sync-io, I see:

WARNING: Detected use of sync API
at _rng (/home/blong/trello-idea-powerup/node_modules/node-uuid/uuid.js:60:53)
at v4 (/home/blong/trello-idea-powerup/node_modules/node-uuid/uuid.js:224:55)
// etc.

The problem is that you're using crypto.randomBytes synchronously, and don't seem to provide a way to pass a callback and use the async API.

broofa commented 7 years ago

The call to crypto.randomBytes(16) takes ~.0025 msecs (on my Mac Pro). Not worth the effort or code complexity, IMHO.

broofa commented 7 years ago

@brendanlong Coming back to this... can you elaborate on why this is an issue in practical terms ("i.e. does fixing add benefit beyond getting rid of the --trace-sync-io warning?")

I assume the randomBytes API offers an async option for a reason, but the only one that makes sense (to me) is the case where you're generating very large quantities of #'s and don't want to block the current thread. But since RNG is an almost purely computational problem (i.e. there's little/no underlying async I/O that mandates a non-blocking api) and node-uuid is generating small quantities of data, I'm struggling to see why this is an issue.

I would love to be educated on why I'm wrong. I don't see a whole lot on the web about this.

broofa commented 7 years ago

FWIW, sync .vs. async perf of randomBytes: https://github.com/mozilla/fxa-auth-server/issues/173

randunel commented 7 years ago

@broofa same test on my PC, which is considerably more powerful than the virtual machines production software runs on:

mihai / /tmp/random-bytes 
--€ cat asyncRandomBytes.js 
crypto = require("crypto");
for (var i = 0; i < 200000; i++) {
  crypto.randomBytes(256, function(err) {});
}
mihai / /tmp/random-bytes 
--€ time node asyncRandomBytes.js 

real    0m0.690s
user    0m2.796s
sys 0m0.260s
mihai / /tmp/random-bytes 
--€ cat syncRandomBytes.js 
crypto = require("crypto");
for (var i = 0; i < 200000; i++) {
  crypto.randomBytes(256);
}
mihai / /tmp/random-bytes 
--€ time node syncRandomBytes.js 

real    0m1.852s
user    0m1.836s
sys 0m0.012s

I would be surprised to find any companies running their servers on macbook pros. The same argument "meh, it's fast enough on my pc" can be used for fs.readFileSync because many macbook pros have SSDs, right?

broofa commented 7 years ago

The same argument "meh, it's fast enough on my pc" can be used for fs.readFileSync because many macbook pros have SSDs, right?

No. You're talking about the difference between providing a method designed to do arbitrarily large (and time-consuming) operations .vs. a library that uses a an sync method in a well-constrained fashion. readFileSync can be used to read any size files. In contrast, node-uuid can't be used to generate arbitrarily large UUIDs. I defy you to paint a scenario where a call to uuid.v4() takes more than a millisecond. Hell, it's hard to imagine it taking even 0.1 msecs, frankly.

defunctzombie commented 7 years ago

There is no reason to make this library async. We could support passing in a random byte generator but if your system is blocked by generating 16 bytes, there are likely other higher value things to address.

janaz commented 6 years ago

It's possible to use the library in async mode by providing the random option item obtained in an async fashion:

const crypto = require('crypto');
const uuid = require('uuid/v4');

const uuidPromise = (u) => new Promise((resolve, reject) => {
  crypto.randomBytes(16, (err, random) => {
    if (err) {
      return reject(err);
    }
    resolve(u({random}));
  });
});

// Usage
uuidPromise(uuid)
 .then(u => console.log("UUID", u))
 .catch(e => console.error(e));

gustavomassa commented 5 years ago

I've runned some testes about sync/async, and the conclusion is that the async had a better performance.

User case: NodeJS Express running on Cluster mode with 4 workers ( Windows - Core I7 7500U) Load Test using K6 with 100 Concurrent Virtual Users getting uui.v4() from a route for 30 seconds Some rules: Must use http2, return success (200) and transaction time < 1000 ms

Async code:

  public async getId(req: Request, res: Response, next: NextFunction) {
    const uuidPromise = (u) => new Promise((resolve, reject) => {
      crypto.randomBytes(16, (err, random) => {
        if (err) {
          return reject(err);
        }
        resolve(u({ random }));
      });
    });

    //async
    const id = await uuidPromise(uuid); 
    res.send(id);
}

Sync code:

  public async getId(req: Request, res: Response, next: NextFunction) {
    const id = uuid.v4();
    res.send(id);
}

Sync Results: uuid_sync

Async Results: uuid_async

Conclusions:

I know the network traffic matters, but I'm using executing the tests on same machine (localhost), I've runned some tests and the results have the same mean. Using sync api has lower iterations, meaning less requests per second, less network throughput. Using async api has much more iterations and requests per second, about 37% more requests than sync api. I know this test is simple, we need more complex tests, but if we have a route that has to generate an uuid for every request, it does matter.

Thanks

broofa commented 5 years ago

@gustavomassa I just ran my own async tests using the benchmark module. The results (see below) make it difficult to accept your findings.

Here are my issues:

My test shows both sync and async forms of uuid run 100-1000x faster (on 2017 MBP) than your test. Even ignoring that sync performance is faster, how do you explain a single incantation of either of these producing a 37% difference in HTTP performance?
Have you verified that sync vs. async HTTP request processing isn't triggering substantially different code paths w/in Express and Cluster? E.g. How do your test cases perform if you just return sync/async 'hello world' string literals instead of doing uuid()?

Express and Cluster are adding a lot of overhead that I'm forced to assume is obscuring any real performance difference uuid might be introducing. Without an SSCCE with which to reproduce your findings, and given my test below, I'm afraid you haven't done much to persuade me this is a real issue here.

(Note: I'm not 100% sure the code below is correct. benchmarks documentation is a little sparse. Please do feel free to verify that I haven't done something wrong.)

uuid_bench.js:

const uuid = require('uuid/v4');
const Benchmark = require('benchmark');
const crypto = require('crypto');

const suite = new Benchmark.Suite();

function asyncUuid() {
}

suite
  .add('Sync uuidv4', uuid)
  .add('Async uuidv4',
    function(deferred) {
      crypto.randomBytes(16, (err, random) => {
        if (err) throw err;
        uuid({random});
        deferred.resolve()
      })
    },
    {defer: true}
  )
  .on('cycle', event => console.log(String(event.target)), {defer: true
  })
  .on('complete', event =>
    console.log('Fastest is ' + suite.filter('fastest').map('name'))
  )
  .run();

Running the test:

$ npm i uuid benchmark
$ node uuid_bench.js
Sync uuidv4 x 327,839 ops/sec ±1.70% (84 runs sampled)
Async uuidv4 x 50,822 ops/sec ±4.09% (75 runs sampled)
Fastest is Sync uuidv4

gustavomassa commented 5 years ago

@broofa hey Robert, thank you for your fast reply,

You are right about Express and Cluster adding a lot of overhead, but we need to test real world scenarios, no one is going to create a nodejs program just to generate uuids. I was thinking about your tests results and mine tests results and I think I understand what is happening:

How do you explain a single incantation of either of these producing a 37% difference in HTTP performance? NodeJS is excel on handling events, on a nodejs program we'll always have multiple I/O events occurring concurrently, the more time nodeJS has to process concurrent events, better will be the program performance and thus it can handle more requests per second. On your tests, the results of both sync/async are the same because nodeJS does not have any concurrently I/O to handle during the execution of the test. meaning the uuid generation has all the processor available for itself.

This nodejs article talks about it: https://github.com/nodejs/nodejs.org/blob/master/locale/en/docs/guides/dont-block-the-event-loop.md

So when we're using the async example of the uuid, we're oartitioning the code, giving more time to other events to run.

Partitioning You could partition your calculations so that each runs on the Event Loop but regularly yields (gives turns to) other pending events.

broofa commented 5 years ago

@gustavomassa:

we need to test real world scenarios

Of course. The problem is that your real-world scenario is not accessible to me, so I have no way of verifying/debugging the problem. And you're complaining about code that is empirically (according to my tests) only responsible for < 0.1% of the response time.

Something just isn't adding up here, and using HTTP test data that measures 100's of requests/second as evidence of a problem with code that runs 100K's ops/second is like handing me a subway map of New York and expecting me to use it to locate the beer aisle at your local corner market. 😕

I think I understand what is happening

Hand waving about how node is optimized to be an I/O event loop doesn't help. I'll simply counter with my own hand-waving about how PRNGs like randomBytes() are not supposed to be I/O bound - they're CPU bound - so releasing control back to the event loop doesn't actually accomplish anything other than to add event-loop + Promise overhead.

BTW, I'm not saying this can't be an issue with uuid() - the internals of OpenSSL's ssleay_rand_bytes() code , upon which randomBytes() is based, are convoluted enough there may be I/O-related blocking in there somewhere - but I have yet to see evidence of such. Unless/until we can verify that, I'm not inclined to muck around with things.

Speaking of which, I revamped my test case, above, to run with Cluster (code here). On my 4-core CPU, the per-core performance drops by 50%. I'm not sure why - I assume there's some common resource contention of some sort - but the overall perf is 2X so take that for whatever it's worth. Still... definitely not something I'm concerned about, and orders of magnitude away from whatever your issue is. Here's the output:

Master 54281 is running
Worker 54282 started
Worker 54284 started
Worker 54285 started
Worker 54287 started
Sync uuidv4 x 179,131 ops/sec ±2.05% (82 runs sampled)
Sync uuidv4 x 179,845 ops/sec ±2.62% (79 runs sampled)
Sync uuidv4 x 178,607 ops/sec ±3.41% (80 runs sampled)
Sync uuidv4 x 178,554 ops/sec ±2.31% (85 runs sampled)
Async uuidv4 x 31,443 ops/sec ±4.72% (70 runs sampled)
Fastest is Sync uuidv4
Async uuidv4 x 29,978 ops/sec ±5.47% (67 runs sampled)
Fastest is Sync uuidv4
Async uuidv4 x 29,621 ops/sec ±4.89% (73 runs sampled)
Fastest is Sync uuidv4
Async uuidv4 x 29,151 ops/sec ±5.94% (64 runs sampled)
Fastest is Sync uuidv4

If you want this taken seriously, you need to provide a test-case that isolates the problem. I'd suggest the following:

Start by wiring your getId() methods with hrtime) so you get an accurate measure of the per-invocation time spent in uuid creation code. If uuid() is the issue, based on your test data there should be a ~25 msec difference between the two forms (favoring async). If it's not the issue, it'll be more like ~0.01 msecs... i.e. this should give us a pretty definitive answer one way or another as to whether uuid() is really at fault here.
If it really is a uuid problem, use the uuid_bench.js code I posted here as a starting point and build up to an SSCCE that reproduces the behavior, then post that code in a new issue and we can go from there.

I know I'm not making this easy for you, but I hope you appreciate my reasoning. Cheers.

forgotPassword commented 4 years ago

Hi, I would to reopen this issue. Offical Node guidelines say not to use randomBytes() in it's synchronous way, as it will block the event loop.

https://nodejs.org/en/docs/guides/dont-block-the-event-loop/

For the sake of completeness, we note that when you call one of these APIs from a callback on the Event Loop, the Event Loop pays some minor setup costs as it enters the Node.js C++ bindings for that API and submits a task to the Worker Pool. These costs are negligible compared to the overall cost of the task, which is why the Event Loop is offloading it.

If you execute them on the Event Loop, they will take far longer to complete than a typical JavaScript instruction, blocking the Event Loop.

broofa commented 4 years ago

@forgotPassword The reason event loop blocking is a concern with randomBytes() is because the size parameter is unbounded. For example (on my laptop):

randomBytes(16) x 449,511 ops/sec ±1.03% (85 runs sampled)
randomBytes(1e5) x 4,164 ops/sec ±0.76% (94 runs sampled)
randomBytes(1e8) x 4.78 ops/sec ±4.84% (17 runs sampled)

If you're generating 1,000,000 random values at a time then, yes, blocking is a real concern. uuid doesn't do that, though. It only ever requests 16 values at a time, where performance is on par or better than many other [synchronous] actions in Node. E.g...

randomBytes(16) x 449,472 ops/sec ±1.06% (87 runs sampled)
new URL('https://nodejs.org/api/url.html?foo=bar') x 503,342 ops/sec ±0.44% (96 runs sampled)
path.resolve('foo/../bar') x 90,384 ops/sec ±0.61% (89 runs sampled)

forgotPassword commented 4 years ago

@broofa Thanks for the reply. I ran the following bench (Windows):

sync.js

var crypto = require('crypto');

var start = new Date();

var array = ['a'],
    crypto_ops = 0,
    js_ops = 0;

var js_loop = function(){
    if(new Date() - start >= 1000){
        console.log('js_ops', js_ops, 'crypto_ops', crypto_ops);
        process.exit(0);
    };

    crypto.randomBytes(16);
    crypto_ops++;

    array.push('b');
    array.shift();
    js_ops++;

    setImmediate(js_loop);
};

js_loop();

// js_ops 227190 crypto_ops 227190
// js_ops 213712 crypto_ops 213712
// js_ops 229695 crypto_ops 229695
// js_ops 229949 crypto_ops 229949
// js_ops 233426 crypto_ops 233426
// js_ops 225040 crypto_ops 225040
// js_ops 221169 crypto_ops 221169
// js_ops 218889 crypto_ops 218889
// js_ops 224063 crypto_ops 224063
// js_ops 223751 crypto_ops 223751

async.js

var crypto = require('crypto');

var start = new Date();

var array = ['a'],
    crypto_ops = 0,
    js_ops = 0;

var crypto_loop = function(err, buf){
    crypto_ops++;
    crypto.randomBytes(16, crypto_loop);
};

var js_loop = function(){
    if(new Date() - start >= 1000){
        console.log('js_ops', js_ops, 'crypto_ops', crypto_ops);
        process.exit(0);
    };

    array.push('b');
    array.shift();
    js_ops++;

    setImmediate(js_loop);
};

crypto_loop();
js_loop();

// js_ops 290741 crypto_ops 93123
// js_ops 284035 crypto_ops 94771
// js_ops 294128 crypto_ops 96012
// js_ops 283591 crypto_ops 93884
// js_ops 299953 crypto_ops 95861
// js_ops 298372 crypto_ops 99015
// js_ops 288589 crypto_ops 96583
// js_ops 286039 crypto_ops 94128
// js_ops 299735 crypto_ops 95843
// js_ops 279200 crypto_ops 90195

Well, it seems with async we gain 20% js_ops, but lose 60% crypto_ops. So overall using the sync version is better. In extreme cases where you don't need high throughput of randomBytes() and want to maximize event loop performance, can consider the async version.

Edit: Well not really, since if we are generating less randomBytes js_ops will also rise. I don't see a use for async randomBytes() unless you create huge random chunks.

Also no idea what V8 does behind the scenes with those loops. I tried running several crypto_loop, while it increased crypto_ops it reduced js_ops.

jstewmon commented 4 years ago

👋 Hi, stumbled across this issue today, read through the comments and wanted to mention that I think the case for using the async version of randomBytes is to avoid blocking the event loop because the operation may block waiting for entropy, which will almost never happen outside of a virtualized or containerized environment.

But, it is not a hypothetical problem for workloads running in virtualized or containerized environments. A quick search for random entropy block virtualized containerized will turn up plenty of material on the subject.

Personally, I've seen this problem manifest a few times in automated test suites that run in a containerized environment running on a virtualized environment - tests would time out waiting on randomBytes.

Anyway, I haven't noticed this issue with this library specifically (great library by the way!), and as was mentioned, it is possible to BYO random bytes, which can be procured asynchronously, so there doesn't seem to be change required. I just wanted to try to add some context regarding why using the async version would be preferential for some use cases. ✌️

malyusha commented 3 years ago

@janaz man, appreciate your comment here, and the topic discussed! I've recently set up the server generating uuid on each request and it was awfully slow. After using async version you published i reduced time of response from 1.5s on 99th percentile to 600ms. Thank you!!!

uuidjs / uuid

Should be possible to use `crypto.randomBytes`'s async API #150