Should we make getRandom() customisable in shared code?

kingo55 commented 4 years ago

Should we let users override the default random function?

E.g. Like how we manage custom storageAdapters. Maybe like a randomAdapter or assignmentAdaptor.

This way users won't be fenced in to use our recommended random method. Also, if users need to change random methods mid-way through experiments, they can retain the method of assignment on active tests but use the new random methods on new experiments. Another benefit, users who don't care about MD5 hashes as RNG, won't bog down the weight of their library with large functions.

allmywant commented 4 years ago

@kingo55 I think it's fesible and it isn't hard to implement.

dapperdrop commented 4 years ago

Good suggestion, happy for it to be added as a feature.

kingo55 commented 4 years ago

I think the best approach for this would be to:

Mojito.options.decisionAdapter = function(testObject, decision) {
    if (decision = 'sampleRate'){
        test.options.decision = someRNG;
        return hexToInt(test.options.decision[0:5]);
    }
    if (decision = 'recipe') return hexToInt(test.options.decision[6:12]);
};

If the decisionAdapter is not set, we can defer to the old seeded random function in the library currently. Anything else I might have forgotten?

@allmywant could we save the MD5 hash in the testObject for re-use?

allmywant commented 4 years ago

@kingo55 Yes we can save MD5 hash in the testObject.

kingo55 commented 4 years ago

Considering we need fresh entropy for both:

sample rate decision
recipe chosen decision

How should we best handle for that in the code without adding more unnecessary complexity?

I figured one way is to use the decision parameter. But another way I thought of is to just increment a key on the testObject? e.g.:

function rng(testObject) {

  ...

  testObject.options.decsionCnt ++;
  return randomNumber;
}

That way, sample rate inclusion is always testObject.options.decsionCnt === (0 || null) but recipe selection is always testObject.options.decsionCnt > 0.

dapperdrop commented 4 years ago

@kingo55 this would be a bit clearer with a decision parameter. I don't think it would add too much more complexity for what it's trying to achieve.

kingo55 commented 4 years ago

@allmywant @dapperdrop - I have been thinking about this more and think the cleanest way forward is to increment a value on the testObject for additional decisions.

I think we should just pass in the testObject as the sole parameter to the decisionAdapter.

dapperdrop commented 4 years ago

@kingo55 cool - happy to go with your direction on this.

kingo55 commented 4 years ago

Ah whoops - should have refreshed the page. Only just saw your comment @dapperdrop.

@allmywant - have you got any further thoughts on this? Ultimately, a decision parameter affords more flexibility if needed down the track.

allmywant commented 4 years ago

@kingo55 We may consider if we shoul allow the user to generate the custom MD5 hash in their code, that way the user has the full control on getRandom function.

kingo55 commented 4 years ago

@allmywant - that is the plan. The decisionAdapter should just return a number between 0 and 1.

We just need to give the user the proper tools to generate decisions for tests. testObject is a useful tool for building the seed. Would specifying the decision be useful in any way? I'm not so sure.

allmywant commented 4 years ago

@kingo55 Got it. Then I think we don't need the decision parameter, it's clearer than sole testObject parameter but we don't know if user need other parameters in the future. So I think we can just pass the sole testObject parameter and prepare any needed properties in testObject, e.g. we can set testObject.currentDecision = 'SimpleRate'|'Recipe' before we call the decisionAdapter:

Mojito.options.decisionAdapter = function(testObject) {
    if (testObject.currentDecision == 'sampleRate'){
        test.options.decision = someRNG;
        return hexToInt(test.options.decision[0:5]);
    }
    if (testObject.currentDecision  == 'recipe') return hexToInt(test.options.decision[6:12]);
};

kingo55 commented 4 years ago

@allmywant - got it... that sounds like a good solution.

Since the decisions are always made in the same sequential order, shouldn't we just increment a decision count? Are there any cases where we would rather a string?

allmywant commented 4 years ago

@kingo55 The only reason I can think of is string is more clear than decisionCount >0 :)

kingo55 commented 4 years ago

@allmywant - that sounds like a good reason. Perhaps we go with a string in that case then.

Are you able to take this one on David?

allmywant commented 4 years ago

@kingo55 Yes, I will.

allmywant commented 4 years ago

@kingo55 Almost done, I've sent a pull request. BTW: I used decsionCnt solution rather than string finally, I realized getRandom might be called more than 3 times if a test has more than 2 recipes and each recipe has the same simpleRate (e.g. 0.5). In that case, decsionCnt is better.

kingo55 commented 4 years ago

@allmywant - Ah yes... that's a tough one.

Could it be called more than 4 times for an experiment? We may need to handle for that, or warn the user for that. Alternatively, could we make it so simpleRate doesn't hit the decisionAdapter so much?

This will make it harder for users to derive the decisions during analysis too.

allmywant commented 4 years ago

@kingo55 I have read the source code again, I can confirm that it will be called more than 4 times for an experiment in a edge case. Here the code how to choose a recipe of an experiment that has simpleRate defined in recipes.

var orderedRecipes = this.getOrderedBySampleRecipes();
var sameSimpleRateRecipes = this.getSameSimpleRateValueRecipes(orderedRecipes);

var weight = 0;
var lastSample = orderedRecipes[orderedRecipes.length - 1].sampleRate;

while (!chosen_recipe)
{
    weight = this.getRandom();

    for (var i = 0; i < orderedRecipes.length; i++)
    {
        if (weight <= orderedRecipes[i].sampleRate)
        {
            chosen_recipe = orderedRecipes[i].key;
            sameRecipes = sameSimpleRateRecipes[orderedRecipes[i].sampleRate];

            if (sameRecipes)
            {
                partitions = 1.0 / sameRecipes.length;
                chosen_partition = Math.floor(this.getRandom() / partitions);
                chosen_recipe = sameRecipes[chosen_partition].key;
            }

            break;
        }
    }

    if (!chosen_recipe && (weight - lastSample) <= (1 - lastSample))
    {
        chosen_recipe = orderedRecipes[orderedRecipes.length - 1].key;
        sameRecipes = sameSimpleRateRecipes[orderedRecipes[orderedRecipes.length - 1].sampleRate];

        if (sameRecipes)
        {
            partitions = 1.0 / sameRecipes.length;
            chosen_partition = Math.floor(this.getRandom() / partitions);
            chosen_recipe = sameRecipes[chosen_partition].key;
        }
    }
}

Consider this experiment:

state: "inactive"
sampleRate: 0
id: "w134"
name: "W134 Summer sale"
recipes: 
  "0":
    name: "Control"
    simpleRate: 0.1
  "1":
    name: "Treatment"
    simpleRate: 0.4
   "2":
    name: "Treatment"
    simpleRate: 0.5
trigger: "trigger.js"

In the while loop, we pick a random weight = this.getRandom(); then compare to each recipe's simpleRate, if matched (if (weight <= orderedRecipes[i].sampleRate)) then choose the recipe. The edge case is:

first loop, this.getRandom() returns 0.8 second loop, this.getRandom() returns 0.9 third loop, this.getRandom() returns 0.7 4th loop, this.getRandom() returns 0.6 5th loop, this.getRandom() returns 0.77 ... All of above do not match: if (weight <= orderedRecipes[i].sampleRate)

So, that is the edge case. I've been thinking about this from last evening, how about we change it this way:

1 just call weight = this.getRandom() once, then loops in recipes and compare with if (weight <= orderedRecipes[i].sampleRate), if matched then all good. Otherwise, we compare the weight with recipe's simpleRate, but condition changed to: which recip's simpleRate is most close to the weight: min(Math.abs(weight - recipe.simpleRate))

In this way, we can limit the getRandom calls of an experiment to <= 3. The third one is needed when an experiment has more than one recipes with same simpleRate value and we need to pick one of them randomly.

kingo55 commented 4 years ago

@allmywant I'm a bit confused by this logic.

I would have thought the following loop would be sufficient:

    weight = this.getRandom();
    var checkedRecipeSamples;

    for (var i = 0; i < orderedRecipes.length; i++)
    {

        checkedRecipeSamples += orderedRecipes[i].sampleRate;
        if (weight <= checkedRecipeSamples)
        {
            chosen_recipe = orderedRecipes[i].key;
            break;
        }
    }

Also, I don't know why the recipes need to be ordered here.

EDIT: Added sum of checkedRecipeSamples

kingo55 commented 4 years ago

Just had to add a tally of the checkedRecipeSamples here.

allmywant commented 4 years ago

@kingo55 The loop you mentioned is better. The reason I sort recipes by simpleRate is: for below recipes: { name: Control, simpleRate: 0.8} { name: Treatment, simpleRate: 0.2}

If we don't sort them, and let's say weight = this.getRandom() = 0.1, the Control will be chosen. But I think the Treatment should be chosen, right?

kingo55 commented 4 years ago

@allmywant - ah yes, excellent point... I see why the ordering exists now.

kingo55 commented 4 years ago

If you think you can reduce the number of calls to getRandom and make this function more efficient, can you please submit a PR @allmywant ?

allmywant commented 4 years ago

Sure @kingo55 I will.

mint-metrics / mojito-js-delivery

Should we make getRandom() customisable in shared code? #28