laylavish / uBlockOrigin-HUGE-AI-Blocklist

A huge blocklist of manually curated sites that contain AI generated content for uBlock Origin & uBlacklist.
Creative Commons Zero v1.0 Universal
2.68k stars 103 forks source link

Add additional search engines to the blocklist #9

Open SkitsTheSkitty opened 8 months ago

SkitsTheSkitty commented 8 months ago

I use Startpage (privacy browser with google-based results instead of bing-based results) and the blocklist isn't blocking anything for me on Startpage. Images of Startpage and Google image results shown for comparison. Google is working as intended, but Startpage doesn't block anything. Could you all add compatibility to not only my search engine but search engines other than Google, Bing, and DuckDuckGo? Thank you! image image

laylavish commented 8 months ago

So, Startpage and other image search engines such as Yandex are a bit tricky to add into this list. Unfortunately, this isn't a simple case of just adding startpage.com to the stack of search engines in the list; startpage, yandex, (and others) deliver images differently than DuckDuckGo, Google, & Bing: This would require a new css styling for those search engines. It's not impossible, but its tricky enough that uBlacklist doesn't support Startpage image blocking. I'll look into it, though!

SaltSouls commented 8 months ago

Would it be possible to add Brave support by chance, or does it have the same problem as the aforementioned search engines?

ch0ccyra1n commented 8 months ago

I wonder if it would be possible to add support for various SearXNG instances using a script to add them for the specific instances a user chooses.

virtadpt commented 8 months ago

@ch0ccyra1n I think it might be possible, though not as drop-in functionality.

I'm already doing something similar with AttackVectors. I wrote a script that converts one of the lists into a SQLite database which is served using RQLite. My agents that use SearxNG's search API pick the URL out of every search hit they get and run a SELECT against that RQLite database to detect and filter out some garbage.

I'm considering doing the same thing with this repo but I haven't had a chance to sit down and dink around with a script yet. Maybe this weekend I think it would involve:

The code for the agents themselves is probably out of scope.

KyodaiKen commented 1 month ago

@ch0ccyra1n I think it might be possible, though not as drop-in functionality.

I'm already doing something similar with AttackVectors. I wrote a script that converts one of the lists into a SQLite database which is served using RQLite. My agents that use SearxNG's search API pick the URL out of every search hit they get and run a SELECT against that RQLite database to detect and filter out some garbage.

I'm considering doing the same thing with this repo but I haven't had a chance to sit down and dink around with a script yet. Maybe this weekend I think it would involve:

* Pulling list_uBlacklist.txt

* `cat list_uBlacklist.txt | grep -v '^$' | grep -v '^!' | sed 's/^\*:\/\/\*\.//g' | sed 's/\/\*$//g'`

* Load that into a SQLite database

* Restart RQlite

The code for the agents themselves is probably out of scope.

This is neat. I first need to bring my instance up to date. Hmm... But what is RQLite? Is it used in SearXng?