Open mlopatka opened 6 years ago
This also comes back when we specify a proportionate sampling strategy here: https://github.com/mozilla/python_mozetl/blob/491fbda515f985f3156ff0c70859624fd4961ea8/mozetl/taar/taar_similarity.py#L168
A solution here would be to specify weights that emphasize specific (niche) cluster representation in the final sample without compromising the non-addon diversity of "large" cluster sampling.
Even an inverse of the current strategy could be evaluated.
@Dexterp37 can you assign this issue to me please? I have insufficient privileges to grab it :|
https://github.com/mozilla/python_mozetl/blob/32d78c34dbb3c9ff5542f1ebc110f5aeb7fce340/mozetl/taar/taar_similarity.py#L131
The diversity of the donor pool is only ensured by the assumption that higher level clustering is substantially diverse. This could be improved by verification of cross-cluster diversity in the addons space.