Discuss revisiting our experiment infrastructure that relies on client-side changes

bobsilverberg commented 3 years ago

When we made changes to our experiment infrastructure for the most recent experiment, it was decided that we would defer all experiment code to run on the client. The reason for this is that sending events to GA is integral to making an experiment work, and we cannot send events to GA from the server. Some alternatives were investigated, but it was determined they were not adequate/were too hacky, etc., so we decided that the solution to this problem was to simply make sure that all code related to experiments which sends GA events runs only client-side.

This has resulted in some performance degradation when an experiment is running and a user first hits the site. The page they land on is rendered, and then some elements on the page might change as a result of the user being added to an experiment cohort and the UI being updated to match what is expected for their cohort.

@diox has raised this as an important issue, so we need to discuss alternatives to our current solution and the pros and cons of keeping what we have vs changing to a different solution. Among the things we should consider are the fact experiments are meant to be temporary (i.e., there likely will not always be an experiment running), as well as the fact that, for most experiments, we split the cohort between a control group and the experimental group, and only the experimental group should experience this performance hit.

bobsilverberg commented 3 years ago

@jvillalobos suggested that a possible mitigation would be to enroll fewer users in future experiments so that the impact of this is lessened. Our current infrastructure does support this. For example, we could put 10% of users in cohort A, 10% in cohort B, and 80% excluded from the experiment.

bobsilverberg commented 3 years ago

One issue that wasn't mentioned in the description above is the fact that, in addition to not being able to send the GA event on the server, we also cannot write a cookie on the server as it interferes with our caching. That was actually the main reason we had to move to this client-side-only method.

One idea that @muffinresearch and I discussed in our 1:1 was to use redux for this on the server. The basic idea would be places the user into a cohort on the server (which we are currently not doing), and store that fact in the redux store. Then, on the client-side, check whether a user has been allocated to a cohort in a pending state, and if so, write the cookie and send the enrollment event, and then change the data in the redux store so this does not happen again.

There may be reasons this won't work, but it's worth discussing/investigating.

diox commented 3 years ago

You'd enroll all users with the same cached response into the same cohort though, wouldn't you ? That may be fine, but it may also skew the results...

bobsilverberg commented 3 years ago

You'd enroll all users with the same cached response into the same cohort though, wouldn't you ? That may be fine, but it may also skew the results...

Oh right, caching... ☹️

rik commented 3 years ago

There's also the impact of the client-side transformation potentially affecting the validity of the experiment. People in the experimental group will first see the same page as people in the control group then, if the experiment changes things in the viewport, there will be a flicker to the experimental version of the page. This skews the results.

Most client side A/B testing providers use an anti-flicker technique that hides the whole page until the A/B testing code has decided in which cohort to place people and performed the potential DOM transformations. This is also problematic for the validity of the experiment. Worse, this will affect the KPIs because the website will appear slower to 100% of the visitors during the duration of the experiment.

As the linked article suggests, server side transformations are the least inaccurate, least KPI disturbing way to do A/B testing. That can be done on the origin server or on an edge computing platform.

bobsilverberg commented 3 years ago

Thanks for this @rik. All of these are reasons why the experiment infrastructure was originally designed to make the changes on the server, and not client-side. It seems we have a lot of reasons to reverse our decision to defer all experiment code to run on the client, but, as @diox points out, we need to figure out how to deal with the caching aspect. Does anyone have any suggestions which would address all of these concerns? If not, I can schedule a meeting for us to brainstorm/discuss alternatives.

willdurand commented 3 years ago

Can we design UI changes to not cause side effects? For instance, I read somewhere about the problem with the download button but we could have styled it so that the original button was taking the same "space" as the "new" button, which would have limited the visible impact on the page.

bobsilverberg commented 3 years ago

Can we design UI changes to not cause side effects? For instance, I read somewhere about the problem with the download button but we could have styled it so that the original button was taking the same "space" as the "new" button, which would have limited the visible impact on the page.

This is an interesting idea but...

Will it actually work? Do we avoid this issue if we change content but not the size of elements?
Is it possible? It seems like it could be quite difficult to make an element take up the same amount of space when parts of it (e.g., the yellow call-out) won't be present. I guess we'd have to come up a way of filling that in with whitespace? Most of our elements take up space based on their contents, so this seems very challenging, and how would it address, for example, buttons that expand to fill their content, when the content will be different for the two different cohorts? It seems like this could introduce visual "regressions" for the state which takes up less space, as we would have to add redundant white space. What about when an element is missing entirely in one state (like an older experiment that only sometimes showed a warning)? The element, in that case a warning, would push the rest of the content down the page. Would we have to have a bunch of white space in place of the warning that would also push down the content?

diox commented 3 years ago

A few additional thoughts/notes:

As discussed in our stand-up meeting, technically if new elements are added that are smaller than the original largest contentful element (and that element isn't removed), that's fine... for the LCP. It may cause enough layout shift to be noticeable by users or by Google through the CLD metric, though.
On a similar note, if the experiment changes something not in the initial viewport, for the LCP (and most users) that's fine. Of course, that's tricky: the viewport might be completely different from our client to the next.
Caching is a must, it's going to play an even larger role as we try to move towards serving the HTML by a CDN. But we only cache for a few minutes (except when serving stale content while the cache is being updated in the background). So maybe we could work with that, do everything server-side, and not necessarily have the exact expected distribution, as long as we can measure how many people got what version that should be fine ?
If we really want to keep doing things client-side, maybe we could render both sides of each experiment in the page, hide one via CSS, adding a class or something quickly enough that it isn't noticeable. Note that since we're now on HTTP/2, we could have small pieces of JavaScript/CSS that aren't part of the main bundle and quicker to download.

bobsilverberg commented 3 years ago

Thanks @diox. To document something else that was discussed in today's meeting; we don't need a solution for this right away, as the next experiment should have no visual differences between the cohorts, but it's likely something we'll need to figure out before the experiment after that.

bobsilverberg commented 3 years ago

There was some further discussion about this on Slack.

@diox suggested we look into the idea of rendering server side and letting it be cached, and suggested that this could be fine as long as we check how many users were actually enrolled in each side.

This led to a discussion that ideally we want users enrolled into a cohort for the duration of an experiment, which means keeping track of that via a cookie. We wouldn't really want the same user seeing UX for different cohorts based on what they were randomly served from the cache each time they return to the site.

In order to make this work we would need to add the contents of the cookie to the cache key (something we've discussed before, and isn't ideal, but I'm not sure what else would work), and then the nginx server would need to respect the value in an existing cookie when serving cached data. We also need to make sure that we only write the cookie on the client, as we discovered in the past that trying to write a cookie on the server breaks caching entirely.

I will file a separate issue for this investigation.

bobsilverberg commented 3 years ago

I have implemented a solution which addresses most of our issues, but I am not sure if it entirely addresses caching. There is a draft PR with code changes available at https://github.com/mozilla/addons-frontend/pull/10665. Here is what it does:

When a component which is wrapped with withExperiment is instantiated (i.e., in the constructor), the following happens:

We check for a cookie for the current experiment and if one exists we use the variant from the cookie as the current variant.
If no cookie exists, we check the Redux store for a variant that may have been stored there on the server. If one exists we use that as the current variant.
If we still don't have a variant, we generate one. 3.1 If we are running on the server, we dispatch an action which will store the generated variant in the Redux store, which will make it available to step 2, above, when the code runs next on the client.

Whatever variant is determined by the steps above is written to a private property in the component and then used in the render method, which is what determines how the component will look based on the variant.

When the code for the component runs on the client (via componentDidMount), we check to see if we need to add the experiment variant to the cookie (which is based on whether the experiment already exists in the cookie or not). If we do need to write a cookie we do two things:

We send an enrollment event to GA.
We write the cookie.

We then dispatch another action to remove the variant from the Redux store.

I have tested this code locally and it seems to work for different combinations of server generated and client generated pages, with and without an existing cookie.

Currently it will only work for one experiment at a time, but it would not be difficult to update it to support multiple experiments running simultaneously.

I am not sure exactly how all this will work with caching, nor am I sure how to go about testing that.

@willdurand I'd appreciate your thoughts on this approach and also on the code at https://github.com/mozilla/addons-frontend/pull/10665.

@diox I'd appreciate hearing how you think this might work (or even whether it will work) with caching.

bobsilverberg commented 3 years ago

This was closed via mozilla/addons-frontend#10665

mozilla / addons

Discuss revisiting our experiment infrastructure that relies on client-side changes #14196