notwaldorf / ama

:raising_hand: Ask @notwaldorf anything!
94 stars 13 forks source link

Chromium as an archiving (warc-generating/reply) tool - thoughts, LoE? #106

Closed hanoii closed 5 years ago

hanoii commented 5 years ago

I stumbled upon https://meowni.ca/posts/chromium-101/, great post.

I feel like asking a celebrity :)

I am in the process of researching alternatives for a (local if possible) internet archiving tool for investigative purposes. This will eventually be open source as well.

There's a lot out there, and outside of plain static crawlers I need to be able to record everything, user interaction, scrolling, private browsed content, etc.

At some point during my initial review I got into thinking that the best place to build this is the browser itself. Chromium has been my first thought, mozilla just the same.

This wouldn't really be a contribution to chrome, but more of a forked project, but your thoughts and insights are valuable as well.

Initially I'd like to know what your thoughts on this subject are, if any.

Then if you can at least let me know if I am about to bite more (A LOT) of what I can chew or is something at at least looks sensible. I guess I'd like to know for a small team (2-3) of capable developers what LoE this might be from from 1-10 from the top of your head.

Best from Argentina!

Edit: #34 had great info, but both your post and that issue are rather old. Are those up to date and building time still an issue?

notwaldorf commented 5 years ago

I need to be able to record everything, user interaction, scrolling, private browsed content, etc.

Off the bat that sounds incredibly creepy, so I don't really know what you're building, but it doesn't strike me as a good idea. If you lie to users and their private browsing content is no longer private, that's incredibly unethical and bad. Secondly, I have no idea where you plan to store this data, or what to do with it, but it sounds like it would be full of personally identifiable data, which is a very gross idea (again), and in some cases possibly even illegal.

Re: #34, much like that issue, I haven't worked on Chrome in about 4 years so "I don't know" is all I've got.

hanoii commented 5 years ago

Well, this was not entirely aimed at the public audience, but a rather specifically subset of users that will certainly not being lied about what the tool does and what it's capturing and storing. This is for investigative purposes, so it's important that all data is captured as is, with privacy concern very clear in mind.

Thanks anyway.