TAAR embedded services need BLACKLIST functionality

mlopatka commented 6 years ago

I would like to figure out the logistics for blacklisting functionality with all TAAR deployments.

Here are my thoughts on a workable solution so far:

The base blacklist should be generated in an automated/programmatic manner to include:
- addons with currently very low ratings (score threshold to be defined)
- newly listed addons (time theshold to be defined)
- all active current SHIELD studies (sometimes these slip through)
The extended black list should include manual additions deemed appropriate by the AMO team and listed in a public repo. If this is a JSON file it can be automatically accessed as part of the automated blacklist importing.

The blacklist should live in a JSON file with a schema like: { "guid": "blacklisted-addon-guid" { "name": "addon-name", "blacklisted": boolean, "reason": "conditon-for-blacklisting" } }

And should be accessible from a Mozilla private s3 bucket. A weekly job scheduled on airflow should make sure that a valid and updated blacklist is available for services that rely on it. High priority ammendments (additons/deletions) from the blacklist should be made only in extreme cases.

Those are my thoughts so far. I hope we can discuss the way forward here.

mlopatka commented 6 years ago

@crankycoder @Dexterp37 @shell1 @muffinresearch @eviljeff @devaneymoz

eviljeff commented 6 years ago

@jvillalobos because content blacklisting and manual updating by "the AMO team". (Though anything really bad wouldn't be listed on AMO anyway so would be filtered out of the response)

mlopatka commented 6 years ago

Ah sorry for the lack of context. this would be a blacklist for recommended content. We already use a white-list approach where only AMO listed addons can ever be recommended.

However, preliminary discussions have lead to the idea that a stricter set of criteria should apply for content that we put forth as recommendations than simply being not "really bad".

jvillalobos commented 6 years ago

With the proposed design (generating a weekly report), I think we'll need a whitelist rather than a blacklist. Otherwise every new add-on created after the list is generated will be included for about a week. Also, I believe we talked about low usage being a metric, rather than low rating (though that could be one too).

Having a whitelist means that it's much longer (thousands rather than dozens/hundreds).

Another thing to take into account is add-on type. This is focused on extensions, right? Themes aren't being considered?

mlopatka commented 6 years ago

We already use a whitelist approach based on an addon being listed on AMO. That list is generated once per week as well. We could use airflow to guarantee with 100% certainty that no addons slip through the cracks during the 1 week schedule lag.

In the interest of transparency I would like to propose that we adopt a two-part solution despite the slightly higher overhead. Some addons should not be eligible for recommendation because they are of low quality (blacklist), whereas some should be ineligible because they are not verified to be high quality as of yet (exclusion from whitelist).

My opinion is that recommender systems can serve to promote new content to clients more likely to appreciate that content (so I would argue for less stringent black listing based on usage). But, I wholeheartedly acknowledge the need to protect the reputation and integrity of the addons ecosystem by not taking risks when it comes to recommending potentially harmful or low quality addons.

As you said @jvillalobos we are not considering themes.

devaneymoz commented 6 years ago

This approach makes sense to me.

shell1 commented 6 years ago

Minimum usage before recommending (similar ADI Size) is covered well with current recommendation model.

We also talked about adding a recency delay of at least 60 days (since creation date), which removes the risk of new add-on “gaming” the system. Long available = likely to have been blocked if there were issues.

shell1 commented 6 years ago

Jorge is going to look at star rating and look if there is an optimum bar.

jvillalobos commented 6 years ago

After looking into it, I think that we can set 3 stars as the minimum average rating an add-on should have in order to be considered. That should filter out most lower-quality add-ons and should remove the need to implement and maintain a blacklist.

mlopatka commented 6 years ago

@jvillalobos @shell1 Do we want to consider a minimum age (since listing/creation/update) to ensure that the during the newly listed period with very few ratings somehting may slip through the 3-star filter? I would think it could be formulated something like:

minimum 3-star average rating AND ( older than XX days OR has >= YYY ratings)

Do you. have any suggestions for thresholds here? We could also do a super quick analysis of rating volatility if some longitudinal (historical) data is available.

jvillalobos commented 6 years ago

The aforementioned "60 days since creation" should be sufficient. So, it would be:

minimum 3-star average rating AND older than 60 days

crankycoder commented 6 years ago

We're going to need to augment the JSON blob in s3://telemetry-parquet/telemetry-ml/addon_recommender/addons_database.json .

We have the average star rating in the existing JSON blob, but we do not have the addon creation date.

We currently have JSON blobs that look like this :

{'categories': {'android': ['other'], 'firefox': ['photos-music-videos']},
 'current_version': {'files': [{'id': 817054,
    'is_webextension': True,
    'platform': 'all',
    'status': 'public'}]},
 'default_locale': 'en-US',
 'description': {'en-US': 'This plug-in is for use on <a rel="nofollow" href="https://outgoing.prod.mozaws.net/v1/b8442be60f7263fb147bc06338e1fe4a43dcf225a865139e7cb3ee40fa42e768/https%3A//bandcamp.com/">https://bandcamp.com/</a> music pages'},
 'guid': '{af4fbf21-abb5-46c5-b45c-8e28af6d3e0c}',
 'name': {'en-US': 'Bandcamp Volume Slider'},
 'ratings': {'average': 5.0,
  'bayesian_average': 3.39976,
  'count': 2.0,
  'text_count': 2.0},
 'summary': {'en-US': 'Adds volume slider to Bandcamp music pages'},
 'tags': ['firefox57'],
 'weekly_downloads': 18}

crankycoder commented 6 years ago

WIP patch is over here: https://github.com/crankycoder/python_mozetl/tree/features/taar-lite-whitelist

crankycoder commented 6 years ago

@jvillalobos fetching the 60 day date is not possible right now using the public JSON API, or the XML APIs from addons.mozilla.org

When looking at Adblock Plus, the oldest version (0.6) is from 2006

https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/versions/

However, if I pull up the JSON data at:

https://addons.mozilla.org/api/v3/addons/search/?app=firefox&sort=created&type=extension&guid={d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}

and the XML data at:

https://services.addons.mozilla.org/en-US/firefox/api/1.5/addon/1865

I can only see the current and beta versions.

I think what we want is a list of versions and dates for each version added to the JSON blob.

eviljeff commented 6 years ago

@crankycoder see the docs http://addons-server.readthedocs.io/en/latest/topics/api/addons.html#versions-list - you need the version list endpoint

jvillalobos commented 6 years ago

Also last_updated being older than 60 days would be sufficient to determine the result meets the criteria, but obviously that would only work for some of them.

mlopatka commented 6 years ago

Any insight as to whether we want to be using the average or bayesian_average ratings fields for filtering to >=3.0? they seem to differ quite a bit in some cases, but I have not dug into the difference in the way they are computed?

jvillalobos commented 6 years ago

I used average for the investigation in my previous comment. I don't really know what bayesian_average is, to be honest. Maybe @diox or @eviljeff have an idea.

diox commented 6 years ago

average is the true rating average. bayesian_average is the same thing, except weighted to handle add-ons with few ratings. The former is what we display the users, the latter what we use anywhere we sort by ratings, like in search.

mlopatka commented 6 years ago

We have implemented the daily ETL job generating an AMO whitelist based on the discussion above., This issue is closed with PR#216 against python_mozetl.

A daily updated whitelist is available in the 'telemetry-parquet' s3 bucket in this file: 'telemetry-ml/addon_recommender/whitelist_addons_database.json'

mozilla / taar-lite

TAAR embedded services need BLACKLIST functionality #1

I would like to figure out the logistics for blacklisting functionality with all TAAR deployments.