innix / shrek

A vanity .onion address generator written in Go.
MIT License
16 stars 0 forks source link

Performance hints #1

Open Yawning opened 2 years ago

Yawning commented 2 years ago

Shrek is the fastest .onion vanity address generator written in Go (at time of writing), but it's still very slow compared to native C programs like mkp224o.

As fast as my SSE2/AVX2 scalar-basepoint multiply implementation is, it is not going to compare to mkp244o because it cheats and only does one scalar-basepoint multiply per worker in the common case.

The gist of it is that instead of regenerating the entire key (expensive) each iteration of the search loop, you instead do something like:

eightTimesBasePoint = Point.Mul(Basepoint, 8) # Precompute this, though this is only called once/worker, so you could not bother.

priv, pub = GenerateKey()
scalar = ParseScalar(priv[:32])
point = ParsePoint(pub)
for {
  if Match(Encode(point) {
    // blah blah blah
  }

  scalar = scalar.Add(scalar, 1)
  point = point.Add(point, eightTimesBasePoint)
}

This does have the downside of not supporting RFC-style seed based private key representations, but IIRC tor supports/uses the expanded scalar | nonce form so keys generated by this algorithm work.

The relevant datastructures and math routines are in the curve and curve/scalar package of your ed25519 import.

nb: mkp244o does the scalar addition somewhat differently because of how I originally implemented it in the code they derived their implementation from, but the addition routine in the scalar package is sufficient (and arguably better because it is computationally impractical for wrapping to happen, so you can omit the check).

innix commented 2 years ago

Hey, thanks for the tips and apologies for replying so late. I finally had some free time over Xmas to work on some side projects 😄

Using your pseudo-code above and using horse25519's code as a reference, I've been able to write up a Go implementation that skips doing the full key generation / scalar multiplication on every iteration. I had some problems with your scalar.Add method in the curve25519-voi library, so I borrowed the addsztoscalar32 function from mkp244o and got it working that way. Maybe I was doing something wrong; crypto isn't really my area of expertise.

This new implementation is much faster; around 4x faster than before. I think it's getting close to being the same performance as mkp2440p when running it using the -z flag. Unfortunately my app still gets obliterated by mkp2440p when it uses -B batching mode. It's so much faster it's kinda unbelievable.