mozilla / fxa-amplitude-send

Data pipeline scripts for importing Firefox Accounts event data to Amplitude.
Mozilla Public License 2.0
6 stars 9 forks source link

Experiment Values are pretty polluted #61

Closed davismtl closed 6 years ago

davismtl commented 6 years ago

If I look at the experiment values, there seems to be a lot of invalid data.

I know we did some work recently on improving the tracking of experiments. Could this be an unintentional bug as a result?

screen shot 2018-04-03 at 2 22 39 pm

CC @irrationalagent

rfk commented 6 years ago

Is the volume tagged with this things pretty low? In the past we've seen this kind of thing result from injection scanning attempts, where pentesters just put a bunch of junk in various query paremeters in order to check for reflected XSS attacks.

davismtl commented 6 years ago

There's quite a bit of volume. The weird thing is that I'd expect to the same thing in entrypoint or utm params if it was pentesters, no?

philbooth commented 6 years ago

As these are defined inside our own code, it seems straightforward enough to validate them to known values only. Moving to next.

philbooth commented 6 years ago

@irrationalagent, this is potentially a good starter issue for metrics stuff if you're interested (or also anything else is fine too if there's something you have your eye on).

irrationalagent commented 6 years ago

@philbooth ill see what I can do!

philbooth commented 6 years ago

@irrationalagent, some notes to get you started...

The best place to do the validation is probably where the content server does all its other metrics validation. You can see those rules in server/lib/validation.js, there's basic regular expressions specified in the PATTERNS object and then a more expressive rule set defined using Joi in the TYPES object.

For a list of literal strings like this, we're probably going to want to add something to TYPES. You can see the Joi docs here, for reference:

https://github.com/hapijs/joi/blob/master/API.md

Ideally, we'd want to take the list of valid experiment names from somewhere else they already exist in the code, so that we don't need to make changes in two places every time we add/remove an experiment. The experiments are all defined on the client-side, under app/scripts/lib/experiments/ and I don't see a readily-available list we can re-use in there at first glance, so we should probably meet up to discuss the best thing to do here. Maybe we can generate the list, let's meet up to discuss this part when you're ready.

When we have something in TYPES that will validate the known set of experiment names, we can use it in server/lib/routes/post-metrics.js instead of the EXPERIMENT_PATTERN stuff and it should give us the validation we want.

Then we might need to update the test fixtures for valid metrics data in tests/server/fixtures/metrics_valid.json and maybe write a new test case or two, but we can dig into that more once it's working.

irrationalagent commented 6 years ago

So I'm trying something like this, in app/scripts/lib/experiments/index.js

// this is the list of grouping rule classes that each have a .name property that we should be able to use?
  const ExperimentGroupingRules = [ 
    require('./communication-prefs'),
    require('./email-first'),
    require('./is-sampled-user'),
    require('./q3-form-changes'),
    require('./send-sms-install-link'),
    require('./sentry'),
    require('./sessions'),
    require('./token-code')
  ];

  module.exports = class ExperimentChoiceIndex {
    constructor (options = {}) {
      this._env = options.env;
      this._experimentGroupingRules = options.experimentGroupingRules || ExperimentGroupingRules.map((ExperimentChoice) => new ExperimentChoice());
      // this should create an array of names?
      this.experimentList = ExperimentGroupingRules.map(rule => rule.name); 
    }

...

// method to ExperimentChoiceIndex that should return the list?
    getList () { 
      return this.experimentList;
    }
  };

but when I instantiate this class (just for debiugging) in app/scripts/lib/app-start.js like

const foo = new ExperimentGroupingRules()
console.log(foo.getList)

I get <unavailable>. Likely that I am totally off base here since I'm still new to js, but it seemed like a reasonable approach? @philbooth

philbooth commented 6 years ago

@irrationalagent it's a good first try, but I think we'll need to take a different approach. First some background...

All of the code under app/scripts is stuff that's executed client-side, in the user's browser. So that's pretty much all of the application logic, including the experiment stuff that you added to.

However, the validation that we want to improve here runs server-side. This means that we can't access any of the (live, in-memory) application state at the point where we're doing the validation, so adding stuff to app/scripts/lib/experiments isn't going to work.

What we do have instead is all of the source code on disk. That should be enough to get us going here because the experiments are defined statically, so we just need a way to read the experiment names out of the code.

One way to do this would be to parse them from the code as is (there are 3rd-party modules that can help with this). Another way is to change the client-side code so that it defines them in a way that is readable from both client and server, for instance a simple JSON file. That second way is simpler, but maybe it's not desirable for the rest of the code, so I'll ask @shane-tomlinson what he thinks about it in a bit.

Then once we've worked out that bit, the rest of this should fall into place quite easily.

philbooth commented 6 years ago

@irrationalagent, cometh the hour, cometh @shane-tomlinson!

Over in mozilla/fxa-content-server#6087, he made a change to provide a list of experiment names that you can read from the server-side. So something like the following in server/lib/validation.js should work if you update your branch to current master:

const { EXPERIMENT_NAMES } = require('../../app/scripts/lib/experiment/grouping-rules');

Then you'd just need to create an EXPERIMENT type circa line 31/32 that made use of the above, then update server/lib/post-metrics.js to use the new type.

Give me a shout if you've got any questions!

irrationalagent commented 6 years ago

haha thanks @shane-tomlinson @philbooth well trial by fire I guess... I'll try to take another crack at this today. thanks Phil for explaining all that background, when I was trying my approach I had a feeling that trying to reference something in app/ from server/ didn't really make sense 😬