mozilla / taar-lite

A lightweight version of the TAAR service intended for specific deployments with reduced feature visibility.
Mozilla Public License 2.0
2 stars 6 forks source link

Analysis: What is the effect of the order pruning low-ranking guids? #43

Open birdsarah opened 6 years ago

birdsarah commented 6 years ago

In PR #41, the production recommenders all, clearly, prune the graph before applying the normalization. Previously it was a bit of both.

We should understand the implications of pruning before or after normalization or other treatments, to do this intentionally.

cc @mlopatka @dzeber

dzeber commented 6 years ago

The RowCount and RowSum normalizations both run along a row of the coinstallation matrix and divide each entry by a function of the values in that column. Since pruning an add-on deletes a column in the matrix, this wouldn't affect the other normalized scores, ie. we would get the same resulting treated matrix if we prune columns before or after normalizing.

For the RowNormSum normalization, scores along the row are first normalized by the row sum, and then RowSum is applied. Deleting a column prior to the normalization would modify the values of the initial row sum normalization, but preserve the ordering of values along the row (which would lead to the same recommendations). However, when we apply the column sum normalization, it's not obvious that the ordering along each row is still preserved - we would need to look into this further.

mlopatka commented 5 years ago

Nice elucidation of the problem Dave. I would argue that the RowNormSum should be applied after any dropped columns are removed. This preserves the characteristic that Rows will still sum to 1.0 post-normalization, thus maintaining the within-row proportion characteristic of the values.