Open courtney-miles opened 6 years ago
I have tried setting Optimus::MAX_ID
to 2^32-1 and 4294967311 (the next largest prime after 2^32) and I'm finding that \OptimusTest::testEncodeDecodeWithXor()
periodically fails.
Checking the prime versus inverse-prime with (PRIME * INVERSE) & MAXID
from Energon::generate()
shows that the generated prime/inverse do not always equal 1. In the case of 2^32-1 it never equals 1, whilst for 4294967311 it is sometimes 1, or 0 or 4294967297.
I'm not yet sure where the fault lies. I Suspect I will need to read up on \phpseclib\Math\BigInteger::modInverse()
to see why it's return value is not being identified as valid.
Yeah I tried to up the number as well, but with bad collisions. Would be a awesome if you could help out on this!
An upfront word of warning: I'm not a mathematician. So most of the concepts in this are completely foreign to me. And what I'm about to say is based on not more than a days worth of research.
I found that 2^32-1 is not just a Prime, it's a Mersenne Prime. I think this is significant in producing an INVERSE
that can be used to decode the obfuscated ID.
It seems that only certain PRIME
numbers will produce a reversible INVERSE
when MAXID
is any other number (prime or otherwise.) But with a Mersenne Prime every INVERSE
will work. Only with the Mersenne Prime will (PRIME * INVERSE) & MAXID == 1
for every PRIME
value.
So MAXID
could probably be a larger number, but then PRIME
would have to a specific prime number rather than any prime number. Possibly there's an equation that could determine which primes are compatible for the chosen MAXID
, but I don't know.
Having said that, I have not found the strategy used in Optimus documented anywhere. There are a lot of discussions of Knuth's Multiplicative Hashing, but none that involve an modular multiplicative inverse and mention of the Mersenne Prime. If I could find mention of this strategy then perhaps it will provide details about the MAXINT
.
Ahah!
So I have confirmation that the strategy used in Optimus does not require primes and a Mersenne Prime as a MAXINT
-- A practical use of multiplicative inverses
So MAXINT
can be any number. And then PRIME
does not have to be prime, rather it must be a coprime of MAXINT
-- that is, neither PRIME
nor MAXINT
can share a common factor, except 1.
The example from the above link would use 1000000000 for MAXINT
and 387420489 for the PRIME
, knowing that a value ending in 9 must be a coprime of value ending in 0. Note that neither 1000000000 nor 387420489 are themselves prime numbers.
So having Optimus use a Mersenne Prime for MAXINT
meant that every prime number was a coprime. To use a different value for MAXINT
will require more careful selection of a value for PRIME
to ensure it is a coprime of the chosen MAXINT
.
I have not yet written code to test this yet.
I also need to determine if it may be that a it is best if PRIME
is both a coprime and a prime, based on observations described at Exploring Knuth's multiplicative hash.
Hmmm... so I was way off. The maths was all perfect. The problem comes back to PHPs handling of extremely large integers.
With 2^32 for MAXINT
, and small enough PRIME
will result in a large INVERSE
. When multiplied against a large encoded value it can exceed PHP_INT_MAX
.
The following is a sample diff to demonstrate that it works if we switch to GMP when the multiplication exceeds PHP_INT_MAX
Index: src/Optimus.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/Optimus.php (revision cec96e8f597cc331a1e0fde5af17ab6621b5aa9e)
+++ src/Optimus.php (date 1519649807000)
@@ -9,7 +9,8 @@
/**
* @var int
*/
- const MAX_INT = 2147483647;
+// const MAX_INT = 2147483647;
+ const MAX_INT = 4294967295;
/**
* @var string
@@ -98,7 +99,14 @@
return gmp_intval(gmp_mul((int) $value ^ $this->xor, $this->inverse)) & static::MAX_INT;
default:
- return (((int) $value ^ $this->xor) * $this->inverse) & static::MAX_INT;
+ $a = (int) $value ^ $this->xor;
+ $b = $a * $this->inverse;
+
+ if ($b < PHP_INT_MAX) {
+ return $b & static::MAX_INT;
+ } else {
+ return gmp_intval(gmp_mul($a, $this->inverse)) & static::MAX_INT;
+ }
}
}
So when dealing with larger MAXINT
values, a maths library will be required even if you have a 64bit system.
I'm happy to raise a PR to allow the MAXINT
to be configurable and to require a maths library when Optimus::MAX_INT^2 >= PHP_INT_MAX
.
PR would be great. Would prefer to throw an exception if the user is trying to do something that is impossible, eg; $b > PHP_INT_MAX
and !function_exists('gmp_intval')
. Maybe even during __construct
.
PR #36 has been submitted.
Hey @jenssegers,
I thought I would drag the conversation about getting this to work with any sized Max Int out of the PR into this ticket.
Couldn't get it to work yet :(
I have been examining the maths behind this to see why it does not work, to then see what would make it work. I think the key thing for this is (PRIME * INVERSE) & MAXID == 1
.
I will use following specific example that works for this formula and disect it.
(211 * 91) & 255
The &
is a binary AND so what we effectively have is this.
0b100101100000001 & 0b11111111 // 1
So we can see that this works PRIME * INVERSE
creates a number that ends with 00000001
and MAXID is 11111111
, and so the rule is satisfied.
If the MAXINT was 200, it's binary representation would be 11001000
. Because the last digit is not a 1, there is no combination of PRIME * INVERSE
that can ever satisfy this rule.
I will now examine the encode/decode formula's we can also see why a base 2 number is required for MAXID
. Simplified, these are the two formulas.
Encode: (VALUE * PRIME) & MAXID
Decode: (VALUE * INVERSE) & MAXID
So if we encode the number 23 this is what we get.
(23 * 211) & 255 // 245
(0b10111 * 0b11010011) & 0b11111111 // 0b11110101
(0b1001011110101) & 0b11111111 // 0b11110101
And if we decode it....
(245 * 91) & 255 // 23
(0b11110101 * 0b1011011) & 0b11111111 // 0b10111
(0b101011100010111) & 0b11111111 // 0b10111 (i.e. the last 8 digits of `VALUE * INVERSE`)
So the true number is buried in the last 8 bits of the result of VALUE * INVERSE
. If MAXID is anything but a series of 1's, the use of the bitwise &
will prevent the true number from ever being recovered -- and it will actually lose the number on the encode so it won't exist to be able to recover anyway.
So that's the problem... and having written this I think I see a possible backwards-compatible solution. Stay tuned...
@courtney-miles did you eventually find a solution?
did you eventually find a solution?
No I didn't, sorry. It was taking too much time and I had to move on. So I'm unsure if there is a solution that is compatible with Optimus.
I have adopted Optimus assuming it would support integers up to 2^32-1.
Is there a technical reason why it is limited to 2^31-1? Is it that the method only works when the max integer is prime number?
I would be happy to submit a PR to make the range configurable. But I don't fully understand the maths to know if there's a logical problem with this.
In terms of the maths, why is a large prime preferred over a small prime?