mozilla / overscripted

Repository for the Mozilla Overscripted Data Mining Challenge
Mozilla Public License 2.0
75 stars 53 forks source link

Continue lit review work #28

Open birdsarah opened 5 years ago

birdsarah commented 5 years ago

Build on @LABBsoft's work to evaluate prevalence in this dataset and document data deficiencies that we could may be supplement with future crawls or related datasets.

shruthi0898 commented 5 years ago

Hi Sarah, can you please explain this a bit? I am able to understand the issue. Can you please illustrate with an example? Also, where can I see @LABBsoft 's work?

Thank you in advance.

birdsarah commented 5 years ago

LABBsoft's work is here: https://github.com/mozilla/overscripted/tree/master/analyses/2018_12_LABBsoft_tracking_review

The starting two documents are: https://github.com/mozilla/overscripted/blob/master/analyses/2018_12_LABBsoft_tracking_review/Tracking%20Methods.md and https://github.com/mozilla/overscripted/blob/master/analyses/2018_12_LABBsoft_tracking_review/Tracking%20Method%20Sources.md

From there we have a template "https://github.com/mozilla/overscripted/blob/master/analyses/2018_12_LABBsoft_tracking_review/Tracking%20Report%20Template.md" - this is not well formatted as a template but the headings from this file are what's important. We have a standard set of questions about a given fingerprinting technique that we'd like to answer including a summary of the technique, how to detect it, whether we can see it in the OverScripted dataset and if we need more information.

Then LABBsoft started work on a few initial reports:

If you do work on this, please open your own analysis directory with the format yyyy_mm_username__title. And check out at least from the literature:

You do not need to follow the format from LABBsoft's directory or even the specific template. We also do not need to duplicate the good work that's in those papers I've linked. What I'd like is a summary of all the different types of fingerprinting, and what we think we could detect with overscripted, what we think we could detect with overscripted with a little more data, and what will always be out of scope for this dataset.