nmalkin / kpi-dashboard

dashboard for visualizing key performance indicators for Mozilla Persona
7 stars 4 forks source link

How to avoid per-segmentation views in CouchDB? #28

Closed nmalkin closed 12 years ago

nmalkin commented 12 years ago

For background, and what we're trying to accomplish, please see #27.

(Example) task

Counting the number of step completions in a given date range

To get this data, we set up a view in CouchDB:

{
    map: function(doc) {
        // doc.newUserSteps is the array of steps completed by the user
        if(doc.newUserSteps.length > 0) { // Only count new users
            doc.newUserSteps.forEach(function(step) {
                // Save the completed step, with the date when it was completed as the key
                emit(doc.date, step);
            });
        }
    },

    reduce: function(keys, values, rereduce) {
        if(rereduce) {
            // omitted
        } else {
            // Count the number of times each step was completed
            var steps = {};
            values.forEach(function(step) {
                if(! (step in steps)) {
                    steps[step] = 0;
                }

                steps[step]++;
            });

            return steps;
        }
    }
}

Then, for example, if we wanted a report for June, we could query the view with startkey=2012-06-01&endkey=2012-16-30&group=false. Easy enough.

More complicated

Same task, but now we want to see the data segmented by operating system. So we set up a view.

{
    map: function(doc) {
        if(doc.newUserSteps.length > 0) {
            doc.newUserSteps.forEach(function(step) {
                // Instead of saving just the step, like last time,
                // now we save both the step and the OS.
                emit(doc.date, {
                    step: step,
                    os: doc.os
                });
            });
        }
    },

    reduce: function(keys, values, rereduce) {
        if(rereduce) {
            // omitted
        } else {
            var systems = {};
            values.forEach(function(value) {
                if(! (value.os in systems)) {
                    systems[value.os] = {};
                }

                if(! (value.step in systems[value.os])) {
                    systems[value.os][value.step] = 0;
                }

                systems[value.os][value.step]++;
            });

            return systems;
        }
    }
}

A little bit more complicated, but overall pretty similar. Quite manageable.

Problems

Okay, that worked, but now we want to segment by browser. That's very similar. The only difference is that, instead of using the doc.os field, we'd need to use the doc.browser field.

Here's the problem: because views in CouchDB can't take arbitrary parameters, we'd have to create a completely separate view with nearly identical code.

And then again, when we want to segment by screen size. And for locale, and so on.

For each segmentation (there will be at least 5 of them), a new view will have to be created, and it will be nearly identical to the other ones.

Furthermore, since most reports will have segmentations, the views will be duplicated across reports.

Question

How can we avoid this?

nmalkin commented 12 years ago

Answer

In the time it took me to write this out, I've realized that while the code will be duplicated across views, the code for setting up those views can be shared. The maps will be different, but the reduce should be the same, even across different reports.

Though the underlying issue remains, it's not as big a deal as I originally thought. (The reused code is in the database, not the codebase.)

I'm therefore closing this issue right away (but creating it first, in case we want to bring back this discussion.)