mozilla / overscripted

Repository for the Mozilla Overscripted Data Mining Challenge
Mozilla Public License 2.0
74 stars 53 forks source link

third party scripts analysis [WIP] #83

Closed ShilpaSangappa closed 4 years ago

ShilpaSangappa commented 5 years ago

Hello Sarah,

I have done an initial analysis on the third party scripts. Please see, if this is OK. I plan to work and keep adding to this folder. So, it's a WIP till April 2nd.

Thanks.

With Regards, Shilpa.

birdsarah commented 5 years ago

Hi @ShilpaSangappa I will likely only have time for one review for you to then follow-up on before April 2nd. You would like me to do that now?

birdsarah commented 5 years ago

@ShilpaSangappa i took a quick look. some thoughts.

  1. hopefully it's clear that you'll need to run this analysis on a meaningful sample (the 10% sample provided or the full dataset).
  2. i'd like to see you contextualize this analysis. how does this compare to other studies that have measured third-party content prevalence? and discuss the differences between your analysis and others.
  3. '3rd party scripts', 'self domain scripts' -> '3rd party scripts', '1st party scripts'
  4. what are the pros cons of using df_domain['script_url'] = df_domain['script_url'].str.split('/').str[2] to pull out domains (you can see how i do it here: https://github.com/mozilla/overscripted/blob/master/analyses/issue_36.ipynb)
birdsarah commented 5 years ago

Also, given that research can be never-ending, please include a section of your write-up where you lay out all the new questions that your work has generated.

ShilpaSangappa commented 5 years ago

Thanks Sarah.

  1. I will run the script for a bigger dataset.
  2. I think, using urlparse library, as you are doing, is better compared to manual string processing. I will change that. The other two points, i need to work on, before I can reply. I don't have the answers now.

I have an Inference section at the end, which has questions to be analyzed.

aliamcami commented 4 years ago

Closing this PR due to lack of activity, please feel free to reopen.