launchdarkly / node-server-sdk

LaunchDarkly Server-side SDK for Node
Other
79 stars 65 forks source link

Poor quality algorithm to convert SHA-1 hash to decimal value for bucketing #157

Closed lachlanhunt closed 1 year ago

lachlanhunt commented 5 years ago

Is this a support request? No.

Describe the bug

The algorithm used to convert a SHA-1 hash into a decimal value between 0 and 1 does not correctly take into account the range of safe integer values supported by 64 bit IEEE 754 floating point numbers as used by JavaScript. It tries to read 60 bits from a SHA1 hash value and convert that to a number with parseInt, and then divides by 2^60 to convert that to a decimal between 0 and 1.

The code actually uses 0xFFFFFFFFFFFFFFF. As written, this would be (2^60 - 1), but the loss of precision makes it equal to 2^60, so it just happens to work by accident.

IEEE 754 numbers provide only 53 significant bits, allowing for a maximum of 2^53 - 1 safe positive integers. That is actually the value of Number.MAX_SAFE_INTEGER. Numbers lose precision beyond this range. This also happens to be the number of uniformly distributed values that can be represented between 0 and 1.

Thus, the correct approach to converting a hash to a decimal in that range is to consume 53 bits, convert it to a positive integer in the range from 0 to 2^53-1, and then divide by 2^53.

Fixing this bug requires care to minimise the risk of causing the bucketing algorithm to switch cohorts. The current approach effectively reads the first 53 bits, but due to the rounding logic when losing precision, can actually cause some bits to be lost from the end and make the distribution less uniform.

Thus, any fix for this needs to ensure that it also consumes the same first 53 bits of the hash.

The following is modified from the bucketUser function in evaluate_flag.js.

https://github.com/launchdarkly/node-server-sdk/blob/master/evaluate_flag.js#L353

var hashKey = util.format("%s.%s.%s", key, salt, idHash);

// To convert to a 53 bit floating point number between 0 and 1,
// we need to extract the first 56 bits of the hash (14 hex digits)
// and then consume as follows:
//
//   1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1000
//   ^.......upper 24 bits.......^ (6 hex digits)
//   ^......keep 21 bits......^
//                            ^..............lower 36 bits...............^ (9 hex digits)
//                             ^.............keep 32 bits.............^

// Extract first 56 bits from sha1 hash, split into upper and lower halves
// The 6th hex digit is necessarily included in both because
// the upper half needs 1 bit and the lower half needs 3 bits from it.
var sha1Hash = sha1(hashKey);
var sha1HashUpper = sha1Hash.substring(0, 6); // 24 bits, need first 21
var sha1HashLower = sha1Hash.substring(5, 14); // 36 bits, need 32 by dropping the first 1 and last 3 bits

// Keep first 21 bits from upper half, left shift by 29 bits
// This is equivalent to reading a 53 bit integer and masking with
// 0x1FFFFF00000000
var hashValUpper = (parseInt(sha1HashUpper, 16) & 0xFFFFF8) * 2**29;

// Convert lower half to unsigned 32 bit integer
// Right Shift by 3 bits, casts to 32 bit unsigned integer
// Discards the highest bit and the lowest 3 bits.
var hashValLower = (parseInt(sha1HashLower, 16) >>> 3);

// Add upper (21 bits) and lower (32 bits) segments to make a 53 bit integer
var hashVal = hashValUpper + hashValLower;

// Divide by 2^53 to convert to a decimal, where 0 <= result < 1
var result = hashVal / 2**53;
lachlanhunt commented 5 years ago

I can prepare a PR to implement this. What's the minimum supported version of Node that my PR would need to be compatible with?

eli-darkly commented 5 years ago

Thanks for bringing this to our attention. Before you do any more work on this, I think we need to think it over here a bit.

I agree that the 53 vs. 60 thing is clearly a problem, and is no doubt causing some inconsistency with the behavior of the other SDKs. But for that same reason (consistency across platforms) we can't make unilateral changes in Node without having a plan for what we're going to do in the others and how we'll roll out the change. That's also why we haven't acted on other hash-related suggestions like using Murmur.

lachlanhunt commented 5 years ago

OK, fair enough. But I expect that if the other implementations are also consuming 60 bits, then they too would run into the same bug because every programming language uses the same IEEE 754 floating point standard.

To illustrate the problem more clearly, when consuming 60 bits, it's possible to output exactly 1.0, which would then fail to fall into any bucket. WIth 53 bits, the maximum possible bucketing value is 0.9999999999999999, which would fall into the last bucket.

Run this in the latest version of Node:

const hash60a = 0xFFFFFFFFFFFFFB0n; // First 53 bits are 1
const hash53a = hash60a >> 7n;

console.log("Test A, Rounding Down:");
console.log("60 bits:", Number(hash60a) / 2**60); // Outputs 0.999...
console.log("53 bits:", Number(hash53a) / 2**53);

const hash60b = 0xFFFFFFFFFFFFFC0n; // First 54 bits are 1
const hash53b = hash60b >> 7n;
console.log("Test B, Rounding Up:");
console.log("60 bits:", Number(hash60b) / 2**60); // Outputs 1
console.log("53 bits:", Number(hash53b) / 2**53);

Even though this demo relies on BigInt support because it made the code simpler, the rounding behaviour is functionally equivalent to what would happen with the current parseInt implementation if the first 54 bits of the hash were all 1s.

lachlanhunt commented 5 years ago

I have taken a look at the bucketUser implementations in Java, Go and PHP and they all make similar mistakes.

The PHP implementation has slightly different number handling. The $LONG_SCALE variable is a signed 64 bit integer that is actually equal to 2^60-1. But when used in the division, PHP still converts it to a double precision floating point number, so the division is still equivalent to dividing by 2^60 and the result is the same as in JS.

https://github.com/launchdarkly/php-server-sdk/blob/master/src/LaunchDarkly/VariationOrRollout.php#L76

The Java and Go implementations explicitly only deal with 32 bit floats, yet still extract 60 bits from the hash. The mistakes are otherwise the same. But since floats only have 24 significant bits, they would return 1.0 if the first 25 bits of the hash are 1s, which is statistically more likely than the JS and PHP cases.

i.e. if the hash is 0xFFFFFF800000000 or greater, the Java and Go bucketUser implementations will return 1.0.

https://github.com/launchdarkly/java-server-sdk/blob/5f92f7fcaade51f508d7d86afd39924f6c186a69/src/main/java/com/launchdarkly/client/VariationOrRollout.java#L46

https://github.com/launchdarkly/go-server-sdk/blob/7c39d9efa803422133c8bdec161cc889b89ee94c/flag.go#L133

I think best solution is to decide how many bits of the hash all of the implementations should consume. Either 24 bits (for 32 bit floats) or 53 bits (for 64 bit floats). Then divide by either 2^24 or 2^53, respectively. In either case, the distributions will be uniform and range from 0.0 to the following:

eli-darkly commented 5 years ago

As you say, the 53-bit issue not just for JS. However, again, if we are going to make any change that affects the behavior of the SDKs, we need to carefully consider how we are going to explain and roll that out, not just fix it in one SDK at a time. It is undesirable to have non-uniform bucketing, but it is also undesirable for customers to suddenly see any change in how users are bucketed just because they upgraded their SDK, especially if they are using multiple SDKs and don't upgrade them all at once.

We have been considering a variety of changes to the bucketing algorithm, some of which would also address the 1.0 edge case you mentioned, and which would also cause changes to current bucketing in which case we would be more likely to go ahead and redo the algorithm from scratch. We will also probably want to do some further analysis to quantify exactly what the effect of the current behavior, and proposed alternatives, is on bucketing distribution.

Thanks again for putting in the effort on this.

github-actions[bot] commented 1 year ago

This issue is marked as stale because it has been open for 30 days without activity. Remove the stale label or comment, or this will be closed in 7 days.