jenssegers / optimus

🤖 Id obfuscation based on Knuth's multiplicative hashing method for PHP.
https://jenssegers.com
MIT License
1.27k stars 74 forks source link

Why limit integer support to 2147483647? #35

Open courtney-miles opened 6 years ago

courtney-miles commented 6 years ago

I have adopted Optimus assuming it would support integers up to 2^32-1.

Is there a technical reason why it is limited to 2^31-1? Is it that the method only works when the max integer is prime number?

I would be happy to submit a PR to make the range configurable. But I don't fully understand the maths to know if there's a logical problem with this.

In terms of the maths, why is a large prime preferred over a small prime?

courtney-miles commented 6 years ago

I have tried setting Optimus::MAX_ID to 2^32-1 and 4294967311 (the next largest prime after 2^32) and I'm finding that \OptimusTest::testEncodeDecodeWithXor() periodically fails.

Checking the prime versus inverse-prime with (PRIME * INVERSE) & MAXID from Energon::generate() shows that the generated prime/inverse do not always equal 1. In the case of 2^32-1 it never equals 1, whilst for 4294967311 it is sometimes 1, or 0 or 4294967297.

I'm not yet sure where the fault lies. I Suspect I will need to read up on \phpseclib\Math\BigInteger::modInverse() to see why it's return value is not being identified as valid.

jenssegers commented 6 years ago

Yeah I tried to up the number as well, but with bad collisions. Would be a awesome if you could help out on this!

courtney-miles commented 6 years ago

An upfront word of warning: I'm not a mathematician. So most of the concepts in this are completely foreign to me. And what I'm about to say is based on not more than a days worth of research.

I found that 2^32-1 is not just a Prime, it's a Mersenne Prime. I think this is significant in producing an INVERSE that can be used to decode the obfuscated ID.

It seems that only certain PRIME numbers will produce a reversible INVERSE when MAXID is any other number (prime or otherwise.) But with a Mersenne Prime every INVERSE will work. Only with the Mersenne Prime will (PRIME * INVERSE) & MAXID == 1 for every PRIME value.

So MAXID could probably be a larger number, but then PRIME would have to a specific prime number rather than any prime number. Possibly there's an equation that could determine which primes are compatible for the chosen MAXID, but I don't know.

Having said that, I have not found the strategy used in Optimus documented anywhere. There are a lot of discussions of Knuth's Multiplicative Hashing, but none that involve an modular multiplicative inverse and mention of the Mersenne Prime. If I could find mention of this strategy then perhaps it will provide details about the MAXINT.

courtney-miles commented 6 years ago

Ahah!

So I have confirmation that the strategy used in Optimus does not require primes and a Mersenne Prime as a MAXINT -- A practical use of multiplicative inverses

So MAXINT can be any number. And then PRIME does not have to be prime, rather it must be a coprime of MAXINT -- that is, neither PRIME nor MAXINT can share a common factor, except 1.

The example from the above link would use 1000000000 for MAXINT and 387420489 for the PRIME, knowing that a value ending in 9 must be a coprime of value ending in 0. Note that neither 1000000000 nor 387420489 are themselves prime numbers.

So having Optimus use a Mersenne Prime for MAXINT meant that every prime number was a coprime. To use a different value for MAXINT will require more careful selection of a value for PRIME to ensure it is a coprime of the chosen MAXINT.

I have not yet written code to test this yet.

I also need to determine if it may be that a it is best if PRIME is both a coprime and a prime, based on observations described at Exploring Knuth's multiplicative hash.

courtney-miles commented 6 years ago

Hmmm... so I was way off. The maths was all perfect. The problem comes back to PHPs handling of extremely large integers.

With 2^32 for MAXINT, and small enough PRIME will result in a large INVERSE. When multiplied against a large encoded value it can exceed PHP_INT_MAX.

The following is a sample diff to demonstrate that it works if we switch to GMP when the multiplication exceeds PHP_INT_MAX

Index: src/Optimus.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/Optimus.php (revision cec96e8f597cc331a1e0fde5af17ab6621b5aa9e)
+++ src/Optimus.php (date 1519649807000)
@@ -9,7 +9,8 @@
     /**
      * @var int
      */
-    const MAX_INT = 2147483647;
+//    const MAX_INT = 2147483647;
+    const MAX_INT = 4294967295;

     /**
      * @var string
@@ -98,7 +99,14 @@
                 return gmp_intval(gmp_mul((int) $value ^ $this->xor, $this->inverse)) & static::MAX_INT;

             default:
-                return (((int) $value ^ $this->xor) * $this->inverse) & static::MAX_INT;
+                $a = (int) $value ^ $this->xor;
+                $b = $a * $this->inverse;
+
+                if ($b < PHP_INT_MAX) {
+                    return $b & static::MAX_INT;
+                } else {
+                    return gmp_intval(gmp_mul($a, $this->inverse)) & static::MAX_INT;
+                }
         }
     }

So when dealing with larger MAXINT values, a maths library will be required even if you have a 64bit system.

I'm happy to raise a PR to allow the MAXINT to be configurable and to require a maths library when Optimus::MAX_INT^2 >= PHP_INT_MAX.

jenssegers commented 6 years ago

PR would be great. Would prefer to throw an exception if the user is trying to do something that is impossible, eg; $b > PHP_INT_MAX and !function_exists('gmp_intval'). Maybe even during __construct.

courtney-miles commented 6 years ago

PR #36 has been submitted.

courtney-miles commented 6 years ago

Hey @jenssegers,

I thought I would drag the conversation about getting this to work with any sized Max Int out of the PR into this ticket.

Couldn't get it to work yet :(

I have been examining the maths behind this to see why it does not work, to then see what would make it work. I think the key thing for this is (PRIME * INVERSE) & MAXID == 1.

I will use following specific example that works for this formula and disect it.

(211 * 91) & 255

The & is a binary AND so what we effectively have is this.

0b100101100000001 & 0b11111111 // 1

So we can see that this works PRIME * INVERSE creates a number that ends with 00000001 and MAXID is 11111111, and so the rule is satisfied.

If the MAXINT was 200, it's binary representation would be 11001000. Because the last digit is not a 1, there is no combination of PRIME * INVERSE that can ever satisfy this rule.

I will now examine the encode/decode formula's we can also see why a base 2 number is required for MAXID. Simplified, these are the two formulas.

Encode: (VALUE * PRIME) & MAXID
Decode: (VALUE * INVERSE) & MAXID

So if we encode the number 23 this is what we get.

(23 * 211) & 255 // 245
(0b10111 * 0b11010011) & 0b11111111 // 0b11110101
(0b1001011110101) & 0b11111111 // 0b11110101

And if we decode it....

(245 * 91) & 255 // 23
(0b11110101 * 0b1011011) & 0b11111111 // 0b10111
(0b101011100010111) & 0b11111111 // 0b10111 (i.e. the last 8 digits of `VALUE *  INVERSE`)

So the true number is buried in the last 8 bits of the result of VALUE * INVERSE. If MAXID is anything but a series of 1's, the use of the bitwise & will prevent the true number from ever being recovered -- and it will actually lose the number on the encode so it won't exist to be able to recover anyway.

So that's the problem... and having written this I think I see a possible backwards-compatible solution. Stay tuned...

robinvdvleuten commented 6 years ago

@courtney-miles did you eventually find a solution?

courtney-miles commented 6 years ago

did you eventually find a solution?

No I didn't, sorry. It was taking too much time and I had to move on. So I'm unsure if there is a solution that is compatible with Optimus.