typelevel / cats-effect

The pure asynchronous runtime for Scala
https://typelevel.org/cats-effect/
Apache License 2.0
2.02k stars 517 forks source link

Make `UUIDGen` more performant #2882

Open armanbilge opened 2 years ago

armanbilge commented 2 years ago

As of https://github.com/typelevel/cats-effect/pull/2880 it delegates to Sync#blocking.

Some ideas in https://github.com/typelevel/cats-effect/issues/2865#issuecomment-1062567227 to improve the status quo:

While UUID.randomUUID is blocking, we can do better:

  • Try to create a SecureRandom.getInstance("NativePRNGNonBlocking"). "NativePRNG" appears to be equivalent on nextBytes on Unix, but that's platform specific.
  • Until Java 9, there's still a mutex, but a pool fixes that.
rossabaker commented 2 years ago

http4s/http4s#6138 is another hidden instance of this problem. Maybe what we really need is a way of safely and efficiently generating random bytes. UUIDGen, and this http4s use case, could both build on that.

armanbilge commented 2 years ago

Yes, I've wondered if we need a SecureRandom[F] type class that extends Random[F]. But btw, at what point does this start scope-creeping outside of CE3? If we get a SecureRandom it will need a JS implementation, which gets into bobcats territory with different implementations for Node.js and browsers and a CI matrix spanning all of them.

armanbilge commented 2 years ago

Ross has demoed his proposal in https://github.com/http4s/http4s/pull/6165.

djspiewak commented 2 years ago

I rather like Ross' proposal.

rossabaker commented 2 years ago

I reformed it a bit to add the pool. This is a win when we get caught in a mutex, which we always will in Java 8. In Java 9, if we have a non-blocking but threadsafe instance, we probably don't want that pool. But detecting thread safety is super gross.

rossabaker commented 2 years ago

Would SecureRandom have any additional methods, or just a marker trait that it's, like, secure and stuff?

I can imagine a marker trait being useful, so a random client can express that it needs something stronger.

armanbilge commented 2 years ago

I was thinking marker trait. I see the JDK SecureRandom adds a couple additional methods like getAlgorithm, no idea how useful those are.

https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/security/SecureRandom.html

rossabaker commented 2 years ago

I have found getAlgorithm nice for debugging. The next would be great, because I already needed bits instead of bytes, and had to go from bytes back to bits. But Java's not going to give us that.

armanbilge commented 2 years ago

Btw, another good reason to implement UUIDGen in terms of our own SecureRandom is that at least on Scala.js, the UUID.randomUUID() method is not cryptographically secure. Actually that seems like a pretty serious issue...

https://github.com/scala-js/scala-js/blob/058532aa8c504b76431b40e3e1b51b2cdef87643/javalib/src/main/scala/java/util/UUID.scala#L139

djspiewak commented 2 years ago

Btw, another good reason to implement UUIDGen in terms of our own SecureRandom is that at least on Scala.js, the UUID.randomUUID() method is not cryptographically secure. Actually that seems like a pretty serious issue...

Uh yes, that's a relatively serious issue.

diogocanut commented 1 year ago

Hey!

I want to take this ticket, I'm just wondering what would be the final solution here. Should we have something like this?

SecureRandom[F].nextBytes(16)

Using our own implementation of SecureRandom and them reimplement the steps done on UUID.randomUUID() ?

public static UUID randomUUID() {
    SecureRandom ng = Holder.numberGenerator;

    byte[] randomBytes = new byte[16];
    ng.nextBytes(randomBytes);
    randomBytes[6]  &= 0x0f;  /* clear version        */
    randomBytes[6]  |= 0x40;  /* set to version 4     */
    randomBytes[8]  &= 0x3f;  /* clear variant        */
    randomBytes[8]  |= 0x80;  /* set to IETF variant  */
    return new UUID(randomBytes);
}
armanbilge commented 1 year ago

Hi @diogocanut, great! Good question, I think there are two changes we need to make:

  1. Exactly like you say, implement our own UUIDGen[F] using SecureRandom[F].

  2. Optimize SecureRandom[F] itself. Currently it is blocking which is bad for performance, but it doesn't have to be. In this PR Ross demonstrates how we can attempt to acquire a non-blocking secure random implementation. We should port that strategy to Cats Effect. https://github.com/http4s/http4s/pull/6165

durban commented 1 year ago

Hm, I'm wondering if the NativePRNGNonBlocking part is still relevant? I'm trying to find some up to date info on it, and it seems to be solved even on Linux: https://lwn.net/Articles/884875/.

(Of course the other thing, JVM 8 using synchronized is certainly relevant on JVM 8.)

durban commented 1 year ago

Uhh, NativePRNGNonBlocking also seems to have a lock... See also https://bugs.openjdk.org/browse/JDK-8278371.

armanbilge commented 1 year ago

Here's a potentially mad idea ... could we just implement our own secure random by reading from /dev/urandom? It's an ordinary file so we shouldn't need any JNI or native calls to use it. And even if we did still need some kind of locking, at least we could do it with Mutex so it's only fiber-blocking.

durban commented 1 year ago

I thought about that, but it seems like a can of worms (although if someone wants to open it, I definitely won't stop them):

armanbilge commented 1 year ago
  • What to do on exotic platforms

Fallback to the current scheme: use JDK SecureRandom with blocking(...)

  • I believe /dev/urandom still happily returns non-random data shortly

Yeah ... and I think you are right that NativePRNGNonBlocking would have similar issue.

  • As far as I know, the alternative /dev/random is fixed on modern Linux, but on older versions it blocks when it feels like it.

Yeah, this seems the most promising avenue, based on the article you linked.


In light of above, my best ideas are:

  1. check for modern linux and use /dev/random, otherwise fallback to blocking with JDK SecureRandom

  2. give up and just use blocking with JDK SecureRandom 🤷

durban commented 1 year ago

Oh, right, here it's possible to just fall back to blocking... that sounds good to me. A possible (but somewhat convoluted, maybe even fragile) third option:

  1. When creating the SecureRandom, a. read a little data from /dev/random (in blocking): this will make sure the kernel RNG is initialized b. write this data into /dev/urandom (in delay): this will make sure that urandom is also initialized, even if it's using a different RNG from /dev/random (I think older Linux might do this?) c. read from /dev/urandom for nextBytes and similar (in delay): it should be safe after b. d. (if any of a./b. fails, fall back to a SecureRandom with blocking)

I think this option has the advantage of also working on older Linux.

(Fallback could be smarter, e.g., detect if we have a sufficiently modern OS that the dance with reading-writing is not necessary, ...)

durban commented 1 year ago

Also, there is a JDK built-in SecureRandom called Windows-PRNG, which uses a Windows API call to get random numbers. Which sounds good, but I could not find any info on whether that call is blocking.