mozilla / overscripted

Repository for the Mozilla Overscripted Data Mining Challenge
Mozilla Public License 2.0
74 stars 53 forks source link

Analysis on #34, calculating percentage of scripts present in dataset #74

Closed Soumya0803 closed 4 years ago

Soumya0803 commented 5 years ago

Finding out the total number of scripts of the three types present in the dataset and calculating the percentage of each script and the three scripts considered together.

birdsarah commented 5 years ago

@Soumya0803 - can you elaborate why this is a meaningful / instructive thing to do? How does it advance the broader goal of #34?

Soumya0803 commented 5 years ago

@birdsarah I was trying to find out which among the three has the greatest contribution to fingerprinting and start looking for that script first in the dataset followed by the other two and find out any individual characteristics and after that look for similarities.

birdsarah commented 5 years ago

Hi. I'm still not completely sure that I understand. Let's break it down a little.

I was trying to find out which among the three has the greatest contribution to fingerprinting

The question is what is "contribution to fingerprinting" - these three scripts are not all the fingerprinting scripts, they're just known examples that we can look at. How does looking at the prevalence of these scripts relate to the population of all fingerprinting scripts?

find out any individual characteristics and after that look for similarities

This bit sounds right.

birdsarah commented 4 years ago

This issue isn't actionable to general community, closing.