Open Soumya0803 opened 5 years ago
Your issue_22 notebook has a merge conflict and I had to manually edit it to get it to run.
Thanks @birdsarah. I will work on all the things you mentioned. I'll get in the practice of keeping my notebook clean, by not adding large amount of data. About Local storage , as I mentioned it is A WIP, local storage values are what I planeed to understand next and find out some meaning.
" This cookie is used to determine and save whether the chat widget is open for future visits" and follow-on claims. Very interesting! But how did you know all this? It's not evident from the data. ''
I found this on their website where the cookies being used were mentioned. i'll try to look more and find its evidence in the data
Instead of transcribing you could use counter.most_common(10)
To help dask you can do dff.script_netloc.apply(get_end_of_net_loc, meta='O') O is the object type which is what is available in pandas for strings.
Thanks for mentioning these, I will do these changes.
To answer your question "Is the script contributing to fingerprinting everytime it is called or there are specific instances?" I would say the answer is yes because you've used fairly precise heuristics to generate those lists and are more likely to have missed some candidates than got too many false positives.
Thank you for answering this I'll update it in the notebook.
Overall. Really great work. Thanks a lot.
I will work more towards issue22 as the value columns has a lot more information and i''ll have to dig deeper. I'll add what more I'm planing to work on at the end of the notebook to indicate how i'm going to accomplish things.
I'll add what more I'm planing to work on at the end of the notebook to indicate how i'm going to accomplish things.
I look forward to that. I'm eager to see your response / thoughts on this question:
You are counting by number of calls, what does that tell you? What are potential biases with these numbers? Would a metric like number of scripts change things?
Hi @Soumya0803, is this ready for review?
Closing this PR due to lack of activity, please feel free to reopen.
Hi @aliamcami, I had worked on some of the points mentioned in the review. I look forward to continue working on this PR.
Thanks @Soumya0803, I'm sorry for the stagnation. We'll take a look at this and your other PR.
@birdsarah I have submited an initial analysis on #22 and small analyses on the TLDS. I will add more to this. For the TLD folder tld_analysis is the main notebook in which the. others are linked. Please review the work done so far.