intuit / wasabi

Wasabi A/B Testing service is an open source project that is no longer under active development or being supported
Apache License 2.0
1.14k stars 239 forks source link

How to give all users the winning variant? #206

Open ptrwtts opened 7 years ago

ptrwtts commented 7 years ago

Hi. I read through the docs, and also #157, and wasn't 100% of sure of the answer to this...

If we run an experiment, and there's a clear winner, is it possible to "ramp up" the winning variant? This means giving it to all past and future users, including those who were previously assigned to a different bucket or null. Is it doable?

This is something we do regularly with our homegrown ab framework. It allows us to rollout out the winning variation to all users instantly, without waiting for a client release.

It sounds like it's possible to do, by editing the experiment, but not something that Wasabi is designed to do. Is that correct?

longdogz commented 7 years ago

Actually, the answer is yes. For a while, the answer was "not exactly" because we had a bug. But that bug has now been fixed.

Basically, say you have an experiment with 100% sampling rate and two buckets, A and B, with 50% allocation percentages. Now you implement the test, both in Wasabi and using the Wasabi APIs from your app, and start the test. If you leave it running for at least 7 days, and you are using impressions and actions, and you have a bucket that is statistically significantly better than the other bucket, we actually show it to you in the UI (a green checkmark on the details screen for that experiment). So you have a winner! (Or maybe you just decide one is doing better enough than the other and you decide it is a winner.)

What you do next is to Empty the bucket that is not performing as well. This will change the bucket allocation of the winning bucket to 100%, so of course, all future assignments will be to that bucket. Also, this will cause anyone who was already assigned to the losing bucket to be assigned to the winning bucket when they come back for their assignment again.

Note that due to our recent caching that was added to improve performance, it may take up to 5 minutes before the users assigned to the emptied bucket will actually see the change. That is, they will continue to receive the old bucket assignment for a while. But then nobody will get the "bad" bucket anymore.

ptrwtts commented 7 years ago

Awesome, thanks for the response.

What if the sampling rate is not 100%? Would you need to switch it to 100% to give all users the winner? Would those who were previously assigned 'null' be included? Or should we just use 100% if this is a technique we want to utilize?

Secondly, is any record kept when you make a change like this? One issue we've always had with ramping up is that it messes with the historical data of the experiment, because it is technically still "running", and new users are being assigned (all to the winning variant). The ideal behavior for us is that you could "end" an experiment by declaring a winner. All existing data and assignments would be preserved, and no new assignments given, however all clients would be given a response as if they were in the "winner" variant. Happy to go into more detail if you think it's a use-case that makes sense to support.

Thanks!

longdogz commented 7 years ago

Sorry, I was using the 100% sampling rate in my example because the math is cleaner and, basically, because the sampling rate is not relevant to what happens when emptying a bucket. The emptying of a bucket is simply applied to the users who were assigned to that bucket.

But if what you're really trying to achieve is a way to be able to ramp up the winning variant, eventually to ALL users, you actually DO need to use a sampling rate of 100%. That is because you can only use the emptying feature on buckets, not on "null" assignments. You can never cause users with the null assignment to somehow be "un-assigned" and thrown back into the mix. So if you eventually want to have all your users assigned to the experiment so that they get the winning experience, then you should have them all in the experiment.

What you do next depends on how you want to test. If, for example, you wanted to have only half your users in the test and you were testing two experiences, you could have 100% sampling percentage, then 3 buckets, "Control" with 50% allocation and then "A" and "B" with 25% allocation. Now you run your test for a while, testing how A and B are doing, having your code basically pay attention to if the user is assigned to A or B, but not worrying about the name of the other bucket. If you decide, for example, that A is the clear winner and you want to start having users get A, you could empty B. Now all the users in B will be assigned (when they come back) to Control or A. Then if you want to start increasing the percentage who get A, you can create another bucket, say Control2, with a smaller allocation percentage than Control, and then empty Control (you'll need to experiment with this to figure out how to actually get the allocation percentages you want). But from now on, anyone who had been in Control, will be assigned to either A or Control2, but the percentage going forward NOT in A will be decreased. And so on...

Regarding your other idea, I think that is an interesting idea. I will mention it to our Product Manager and the rest of the team to see if they'd like to move forward with it. Of course, this is an open source project, so anybody could take that on? 😄

It sounds like you sort of want to have your cake and eat it too, that is, you want to keep a record of the data that you used to make the decision to go with a new experience (I guess for posterity or for later reminding yourself of why you made that decision?), but then you want to move everyone to a given "winning experience" by leaving the experiment running and changing all those assignments.

I think one thing to note is that implementing your UI using Wasabi is not necessarily the best long term strategy? That is, if you have done a test and decided that the correct experience is A, I would expect you to implement A in your UI, remove the test code and then move on to the next thing you might want to test? So you wouldn't really need to use Wasabi to roll out your winning experience, rather, just use it to decide what that experience is, which is really what Wasabi was designed to help with, and then implement the winning experience, deploy it, and move on.

But also, you can somehow record the results, maybe including an export of the assignment data or the action rate information or something? I guess it depends on why you need that information and how much of it and what you need.

ptrwtts commented 7 years ago

Thanks for the detailed response!

You're 100% right. We essentially want the immediate impact of rolling out a winning variant, without waiting to update any code, and with clean historical data while we're at it. It's true that this power can encourage bad behavior (not immediately cleaning up experiment code), but often the realities of software organizations (competing priorities, users who are slow to update apps) mean that the value gained from a remote roll-out is worth more than the technical debt incurred.

Anyway, it sounds like we can do what we want, as long as we setup experiments correctly (100% allocation). And as for data, we just need to keep a log of when changes were made.

Cheers.

longdogz commented 7 years ago

One more thing, to address your question about logging. We do have logging of actions in Wasabi. You can see them on the Tools->Logs tab (on the latest version). These are actions like creating an experiment, starting an experiment, stopping an experiment, etc. I'm not sure if we currently have the granularity of logging you want, but this might allow you to get some of the information you want.