slifty / internet_noise

MIT License
89 stars 26 forks source link

A few ideas... #14

Open JLCressaty opened 7 years ago

JLCressaty commented 7 years ago

The top 4,000 nouns is a good start, but it's easily filterable. If you can make a system that tracks certain searches, and searches tangential topics, and then topics tangential to those, you can lose surveillance in a sort of fractal search tree. Also, if you can add a system similar to the tl;dr bots, that then picks out key topics, concepts, and words >12th grade reading level from those tl;drs, and then searches those after some random periods of time after loading the page, I think you will have a system that totally fucks internet surveillance, at least for a while.

dakshil commented 7 years ago

Maybe something like a WordNet?

JLCressaty commented 7 years ago

I think that would be the best place to start.

JLCressaty commented 7 years ago

If you take the headline of a viewed article or webpage, then search that page for terms in the WordNets of terms in the headline, and then search google/bing/yahoo/ddg (any random one? maybe also add a search bar that searches random engines to further obscure actual searches?), for some of those related concepts, and then track those searched concepts, and after some amount of time, search other tangential concepts from the WordNets of concepts in the results of the searches, I think it might make it all pretty noisy. If you can track tangents from previous webpages, and keep them going for 15 - 30 minutes, while adding in tangents from newer pages actually being browsed, I think it will make the data mostly useless, at least to corporations.

I also think that the greater the stretch between concepts, the better, otherwise the branches of the concept tree will be fairly densely packed. The more the tree stretches out across the conceptual field, the more their collected data will be meaningless.

JLCressaty commented 7 years ago

Also, it would further obscure browsing if it could pick random submissions from various social media sites and jump down rabbit holes it picks.

dakshil commented 7 years ago

Yeah, to further obscure, the page browsing could actually be automated to actually click on random links on the site to build artificial browsing patterns.

awesmubarak commented 7 years ago

Perhaps the intial words could come from scouring popular RSS feeds. The text from the latest stories could be sorted through to find key concepts and important words using an algorithm similar to the one employed by smmry. A full RSS feed reader would not need to be implemented as the feed would only be pulled in when the page is loaded (or reloaded), leaving the website sanely light. The words would be used to seed the wordnets, and would therefore result in time-relevant searches, reducing the difference between actual and obscuritated searches.

JLCressaty commented 7 years ago

These are wonderful ideas guys. Another thing to keep in mind is the sort of patternized footprint one has through the internet on average. There are certain pages one can be determined to visit more frequently, or generally first in the day, and reading speeds, and click frequencies, etc. If you don't somehow obscure that or track it and replicate it, it will be simple to just pick out the visited pages that existed before you turned the app on.

So definitely random clicks on currently browsed pages need to happen, but at certain times after page load, because that will be the real flag of an interest and the real flag of a person rather than a bot. You almost will have to build it to analyze the user's browsing habits and then replicate clicks and interests and slowly abstract them, and really have it running all day. I don't think people will want to use a really effective version of this on mobile data connections, which is fine.