mint-metrics / mojito-js-delivery

🧪 Source-controlled JS split testing framework for building and launching A/B tests.
https://mojito.mx/docs/js-delivery-intro
Other
16 stars 29 forks source link

Deterministic but random assignment based on User ID and other test-specific data #4

Closed kingo55 closed 4 years ago

kingo55 commented 5 years ago

Browser vendors are waging a war on first-party cookies. As we use cookies to keep users bucketed, it's prone to ITP's 7-day/24hr cookie deletion. We need to stay ahead of the curve for experiments run on our sites.

One potential solution (assuming sites allow users to log in or or set cookie IDs server-side), we could use a good, strong identifiers to seed the Mojito PRNG. By taking a hash of the User ID, Wave ID and other input, we can assign users randomly and deterministically. There are also other benefits to this approach:

  1. Fewer cookies written to maintain bucketing
  2. Greater control over bucketing and avoiding SRM issues (e.g. we could back-test the random assignment over the last 30 days of user IDs we encountered)
  3. If tracking is not consistent, we can compute which buckets a user should have been tracked into
  4. It's an approach supported by Microsoft experimentation platform through their "Seed Finder"

Example of how it might work

I have read this excellent post here describing how it might work in JS: https://github.com/bryc/code/blob/master/jshash/PRNGs.md

function xmur3a(str) {
    for(var k, i = 0, h = 2166136261 >>> 0; i < str.length; i++) {
        k = Math.imul(str.charCodeAt(i), 3432918353); k = k << 15 | k >>> 17;
        h ^= Math.imul(k, 461845907); h = h << 13 | h >>> 19;
        h = Math.imul(h, 5) + 3864292196 | 0;
    }
    h ^= str.length;
    return function() {
        h ^= h >>> 16; h = Math.imul(h, 2246822507);
        h ^= h >>> 13; h = Math.imul(h, 3266489909);
        h ^= h >>> 16;
        return h >>> 0;
    }
}

function sfc32(a, b, c, d) {
    return function() {
      a |= 0; b |= 0; c |= 0; d |= 0; 
      var t = (a + b | 0) + d | 0;
      d = d + 1 | 0;
      a = b ^ b >>> 9;
      b = c + (c << 3) | 0;
      c = c << 21 | c >>> 11;
      c = c + t | 0;
      return (t >>> 0) / 4294967296;
    }
}

var userId = 'cadb185d-959f-4197-ad29-1d88a9fd7bd6';
var waveId = 'w37-1';

sfc32(xmur3a(userId)(), xmur3a(waveId)(), xmur3a('something')(), xmur3a('something else')())();

The resulting RNG will always be: 0.6095077660866082

In the example above, I used a Snowplow cookie ID, the Wave ID and two other strings for generating the seeds. Of course, we can and shoulf make these inputs flexible.

Issues with Deterministic assignment

Assigning users is easy. But to exclude users from a test, we may not be able to keep them from being excluded following ramp-up. Particularly since we don't allow users into tests where they've previously being excluded by sample. from

I.e. If we launch an experiment at 10% for two days, and we ramp up to 100% of traffic, how will we know to continue to exclude users if they were kicked out of the test in the first two days?

To determine

Assuming this is the right approach, we'll need to do a bunch of back-testing over our data in Snowplow and some tests in production. I don't know a whole lot about hash functions and PRNGs either, so we'll need to figure this out, like we did when testing the different RNGs a few years back.

merlijnbigtree commented 5 years ago

This looks like really nice approach, thanks for posting! I noticed however, that your code doesn't seem to use the hashSeed var, would you just pass this as one of the arguments to the sfc32 function?

I just ran the code on a list of ~167,000 Snowplow actual network_userids, and the distribution seems pretty even (84118 over 0.5 vs 83614 under ). And also a smaller timeframe (~13k ids) : 6936 vs 6878. Looking promising to me!

kingo55 commented 5 years ago

Thanks @merlijnbigtree - was just trying to get this down on paper. You're right that I don't use hashSeed in the example - I meant to remove that. Fixed the post now.

Great to hear the tests on your network_userids was nice and even - I'll have a crack with some of our collected IDs too. Not expecting the result to change. Running an SRM test to check ratios are even seems to show your groups check out well:

> mojitoSrmTest <- function(a, b, expected_ratio=0.5) {
+ 
+   actual_ratio <- max(a, b) / sum(a, b)
+   srm_z <- (actual_ratio - expected_ratio) / (sqrt( expected_ratio * (1-expected_ratio) / sum(a, b) ))
+   srm_p <- 2 * (1-pnorm(srm_z))
+   
+   return(srm_p)
+   
+ }
> 
> mojitoSrmTest(a = 84118, b = 83614)
[1] 0.2184665
> mojitoSrmTest(a = 6936, b = 6878)
[1] 0.6216745

Got to run a bunch more to confirm. Did you run your tests via JS or some other language?

merlijnbigtree commented 5 years ago

Ok, that's good to know!

Yes, I just used javascript... to quickly test this I created some HTML with a single column containing all of the ids. Then scraped it all with a query selector and called your function in a for loop.

(For performance reasons I initially tried to convert the xmur3a and sfc32 functions into C# but couldn't immediately figure out how to convert those operands, but I guess it would make more sense for larger sets of data ... anyway, as it turned out the plain JS version turned out to be sufficiently fast).

kingo55 commented 5 years ago

@allmywant - what do you think about this?

I have tried replicating the results of MurmurHash3 and SFC32 in Python, but I get different results from the JS implementation above.

JS

> xmur3a('cadb185d-959f-4197-ad29-1d88a9fd7bd6')()
4058983162

Python

> import mmh3
> print(mmh3.hash("cadb185d-959f-4197-ad29-1d88a9fd7bd6"))
-2080962842

I'm not tied to this approach. Can you think of a better way to assign users based on a given ID?

Should we just ignore the hash and focus on a PRNG with repeatable results based on the seed, across languages (R/Python)?

allmywant commented 5 years ago

@kingo55 The MurmurHash3 lib you used in Python is mmh3 (https://github.com/veegee/mmh3), it returns a signed int 32 hash. And the js version returns a unsigned int 32 hash (the last line: return h >>> 0;).

Also, I think the implemetation of the JS and Python libs are different on the hash calculation, so I tired this JS lib: murmurHash3.js (https://github.com/karanlyons/murmurHash3.js/tree/master), it returns unsigned int 32 hash too. For example:

Python

> import mmh3
> print(mmh3.hash("cadb185d-959f-4197-ad29-1d88a9fd7bd6"))
-2080962842

JS

murmurHash3.x86.hash32('cadb185d-959f-4197-ad29-1d88a9fd7bd6')
2214004454

If I convert the js hash into signed int 32:

2214004454>>0
-2080962842

The result is identical to the Python. So, I think we can use the murmurHash3.js lib, just need to convert the result to signed int 32.

kingo55 commented 5 years ago

@allmywant - excellent! That explains it... Do you think it still makes sense to go from?:

Input string -> Hash -> PRNG seed

Or is there another way or different hash function that will be faster and use fewer bytes in the library? We certainly wouldn't include the full 5kb lib in Mojito, but at least the 32bit hash function in that JS lib is quite lightweight:

https://github.com/karanlyons/murmurHash3.js/blob/master/murmurHash3.js#L210

allmywant commented 5 years ago

@kingo55 I think that make sense. I'm not familiar with hash algorithms but I found this hash benchmark (https://github.com/joliss/fast-js-hash-benchmark). I also tested the iMurmurHash.js which recommended in that benchmark, it's faster (0.02ms) than murmurHash3.js (0.24ms). In fact both of them are fast, but iMurmurHash.js will give you collisions after about 2**14 (16,000) hashes.

We can extract the 32bit has function only.

allmywant commented 5 years ago

@kingo55 xxHash (https://cyan4973.github.io/xxHash/)

kingo55 commented 5 years ago

Oh yeah, I was reading about that. It looks interesting. Lots more tested here too (both Murmurhash and xxHash look good):

https://github.com/rurban/smhasher

I think xxHash license is a bit weird. It sounds like we'd have to include a lot of text even in minified form. Would probably need to clarify with the author:

https://github.com/Cyan4973/xxHash/blob/dev/LICENSE

dapperdrop commented 5 years ago

https://github.com/garycourt/murmurhash-js/blob/master/murmurhash3_gc.js looks ok if a bit old. Not sure which variant of Murmurhash3 it is.

dapperdrop commented 5 years ago

Spookyhash looks fast as well, but hard to find a JS implementation. Found: https://github.com/vkandy/jenkins-hash-js - no license though :/

kingo55 commented 4 years ago

@allmywant @dapperdrop - I had a go at implementing and testing the deterministic assignment here. It works as expected!

We will need to decide how we handle the combination of sampleRate and recipes so we get even distribution each time we need to make a decision. See the commit I made on this branch here: https://github.com/kingo55/mojito-js-delivery/commit/89ddba7f4f5988ac8c8f9a87065ab5c1e5e8e75f

Keen to hear your thoughts on this!

kingo55 commented 4 years ago

@merlijnbigtree - good news, we're close to sharing our "Deterministic Assignment" solution (plus some minor refactoring to make things faster/easier to use).

We introduced a new feature called the decisionAdapter (described through #28 and via decision adapter docs here). It lets you implement your own assignment logic that you can plug into tests. Since we're not crypto experts, we figured users may want to choose their own hashing functions for assignment.

Approach we're currently taking

MSFT decided on using MD5 hashes (section 4.1.2, page 6) because they produce uniform random numbers using similar seeds. I've also heard anecdotally that Airbnb and others also use the MD5 hash function for their assignment.

MD5 is well supported in JS, R, Python, Redshift et al, so an analyst can easily calculate assignment using a number of tools, provided they have the seed.

# Get the hash of: user ID + test ID
seed <- paste0(
  "a4d43ce9-213b-44fc-8471-0c49f3acdbff", # User ID
  "test3" # Test ID
  )
hash <- digest::digest(seed, algo="md5", serialize = F)

# We only need part of the string for a large number
decisionInput <- substr(hash, 1, 6)

# Convert the hex to integer and divide by the largest possible number to get the decision
decision <- strtoi(decisionInput, 16)/0xFFFFFF

# Coin has flipped, check if heads or tails
ifelse(decision > 0.5, "treatment", "control")

Backtesting real Snowplow UUIDs through this method seems to produce uniform assignment. Even when compared against R's built-in PRNG:

w1

w2

Using nearly identical seeds for the hash function (w1 and w2) and the same user IDs, we can get uniform assignment across tests when those same users were assigned to control/treatment of the previous test.

w2_w1_control

w2_w1_treatment

Though through the decisionAdapter users could introduce their own test-specific salt to vary the seed more. If users are worried about n-way interaction effects, they can even switch out the hash function for a preferred option.

Current testing

We've got this solution out being tested in the wild across a few properties as AA tests. Things are looking nominal. If you guys are interested, you might be able to play around with this approach, too.

merlijnbigtree commented 4 years ago

@kingo55 awesome, looking forward to trying this!

kingo55 commented 4 years ago

@merlijnbigtree - we just merged the requirements for deterministic assignment.

I don't know which user ID you use on your end (presumably you are using something along the lines of the _sp cookie set by your Snowplow collector. You could get up and running through:

  1. Merging our master branch into your local branch
  2. Set a Mojito.options.userId value to use in your bucketing decisions
  3. Next, define a custom decisionAdapter like this, in your shared-code.js:
// Set the user ID (or if "_sp" already exists, you could use your Snowplow user ID)
Mojito.options.userId = Mojito.utils.getMojitoUserId();

// Custom decision adapter
Mojito.options.decisionAdapter = function(test) 
{    
    // Calculate decision from MD5 digest
    test.options.digest = test.options.digest || Mojito.utils.md5(Mojito.options.userId + test.options.id);
    var startPos = test.options.decisionIdx * 8,
        endPos = startPos + 8,
        digest = test.options.digest,
        decision = parseInt(digest.substring(startPos, endPos), 16) / 0xFFFFFFFF;

    // Ramp-up spillover protection - By Lukas Vermeer https://lukasvermeer.nl/
    // Exclude users below a test's excludeSampleRate threshold to avoid spillover after ramp-up/restart
    if (test.options.decisionIdx === 1 && test.options.excludeSampleRate && decision < test.options.excludeSampleRate) 
    {
        return -1;
    }

    return decision;
};

@lukasvermeer - this is the ramp-up spillover protection you suggested. Been using it at Mint Metrics here... it's a defining feature against Optimizely et al!

lukasvermeer commented 4 years ago

I can't fully grok what this does. I think test.options.excludeSampleRate should be set to exclude all traffic that would have been in a previous ramp, but I don't see a number representing the upper bound of each ramp?

lukasvermeer commented 4 years ago

To clarify, what I was suggesting was:

// Define current ramp range
const rampPercentages = [0, 0.01, 0.1, 1]; // three ramps 1%, 10%, 100% of traffic
let currentRamp = 2; // let us pretend this is the second ramp (1-indexed)
let lowerBound = rampPercentages[currentRamp - 1];
let upperBound = rampPercentages[currentRamp];

// Perform two independent coin flips.
const userId = '123'; // some user id should be given
let userRamp = someHashFunction(userId + 'ramp'); // ramp flip returns rand(0, 1)
let userVariant = someHashFunction(userId + 'variant'); // variant flip returns rand(0, 1)

if (userRamp >= lowerBound && userRamp < upperBound) {
  // User is in ramp.
  return userVariant;
} else {
  // User is not in ramp.
  return control; 
}
kingo55 commented 4 years ago

The upper-bound constraint you refer to is applied just beyond the decisionAdapter via the test's sampleRate parameter (1):

// "this.getRandom()" returns a value from the `decisionAdapter`, so -1 forces users out of the test
inTest = (this.options.sampleRate <= 0) ? false : (this.getRandom() <= this.options.sampleRate);

Source

When decision < test.options.excludeSampleRate is met, we return -1, which effectively excludes users from the test.

How to perform a 1%/10%/100% ramp here

Just tweak your sampleRate (upper-bound) and your excludeSampleRate (lower-bound) values, like so:

Initial 1% run

id: ex1
name: My test
sampleRate: 0.01 # Include 1%

Next <10% run, excluding former 1% run

id: ex1
name: My test
sampleRate: 0.1
excludeSampleRate: 0.01 # Exclude < 1%

Final full ramp up, excluding former 10% run

id: ex1
name: My test
sampleRate: 1
excludeSampleRate: 0.1

(1) Users can apply upper-bound logic in their decision adapters too but I haven't thought of a good use for this yet. I wonder if there's a case we'd want to perform the reverse logic? 🤔 i.e. If decision > sampleRate, change the decision.

kingo55 commented 4 years ago

Closing this one.