uBlockOrigin / uAssets

Resources for uBlock Origin, uMatrix: static filter lists, ready-to-use rulesets, etc.
GNU General Public License v3.0
3.95k stars 743 forks source link

Net Scan: 4TB All, 40TB BAB&app_vars #4268

Closed jspenguin2017 closed 5 years ago

jspenguin2017 commented 5 years ago

20 packages scanned, 1 package processed, about 50,000+ packages expected per month. About 80 GB data scanned out of about 200+ TB expected per month.

Hard anti-adblock: ``` # BAB http://f4.motogon.ru/motokross/mxgp-2018/msg717172/ https://dc-chronicle.com/2018/08/01/robert-mueller-refers-democrat-tony-podesta-criminal-charges/ https://www.sofiotheque.info/2018/02/telecharger-50-dossiers-de-maladies.html # Other http://www.kizi.cm/ # Comes up every time ``` Annoyance: ``` # anOptions http://allamericansthings.com/2018/11/prototyping-the-betentacled-inflatable-soft-robots-of-zero-gee/ http://thejewishvoice.com/2016/03/16/intel-acquires-replay-technologies-to-up-its-sports-game/ https://usa.watchpro.com/millennials-reason-behind-luxury-spending-boost-china/ http://www.ebizlatam.com/tag/gerardo-coronel/ http://www.fagagnaonline.com/ https://2spendless.com/?tag=forever https://centrafriqueactu.com/2018/11/01/congo-la-gouvernance-forestiere-au-coeur-dun-forum-regional-a-brazzaville/ https://www.betglob.pl/tag/33 ```
jspenguin2017 commented 5 years ago

RIP...

![image](https://user-images.githubusercontent.com/7283682/49681561-2687b600-fa61-11e8-99f1-499d9652e6dc.png)
jspenguin2017 commented 5 years ago

Annoyance

http://www.noodletowntranslated.com/global-evolution/global-evolution-chapter-45/?replytocom=9108
okiehsch commented 5 years ago

I thought your server was down.

jspenguin2017 commented 5 years ago

Hard (BAB)

http://www.baumblaetter.de/
https://furkankhan.000webhostapp.com/bookmarking/design-build-in-manhattan-apartment-renovations-nyc-design-build-firm-in-nyc.html
http://gameinfo.pw/pendapatan-grand-theft-auto-v-kalahkan-penjualan-film-avatar/
jspenguin2017 commented 5 years ago

Well, not down enough I guess... 1.7k backlog again, I need a lambda to trim nonsense out of the raw scanner output.

It was a tough start, but it'll get better, lots of the result now are duplicates.

jspenguin2017 commented 5 years ago

Hard

http://www.classifiedscalgary.ca/ads/calgary-toronto-real-estate-ad-700854/
http://www.metropolisweb.it/metropolisweb/2018/05/17/scavi-pompei-torna-alla-luce-vicolo-dei-balconi/
https://www.mocasoft.ro/

Annoyance

http://mywrestling.com.pl/mycast-6-special-qa-with-shadow-juz-20-lipca-kanale-mywrestling/
jspenguin2017 commented 5 years ago

Hard

http://www.yafud.pl/14632/
# Comes up every time
https://philippinenmagazin.de/2018/11/07/weiterer-menschenrechtsanwalt-auf-den-philippinen-getoetet/
# Comes up every time
https://silvertails.net/threads/classic-call.2900/
jspenguin2017 commented 5 years ago

Hard

https://www.upi.com/Top_News/US/2018/06/21/House-passes-farm-bill-tightening-work-requirements-for-SNAP/9081529619198/?spt=rrs&or=5

Annoyance

https://webwereld.nl/e-commerce/4478-einde-news-nl-is-niet-het-einde-van-de-scanpen
okiehsch commented 5 years ago

https://webwereld.nl/e-commerce/4478-einde-news-nl-is-niet-het-einde-van-de-scanpen What kind of generic anti-adblock solution got flagged on that site?

jspenguin2017 commented 5 years ago

Admiral. Although I think it's something else that's causing it.

okiehsch commented 5 years ago

Yes it is, I can see a request to an admiral domain, but that is blocked by Peter Lowe's list and even if I allow it there is no anti-adblock message.

jspenguin2017 commented 5 years ago

I see a red banner:

![image](https://user-images.githubusercontent.com/7283682/49689737-b3268880-fae2-11e8-95d1-cac262ff041d.png)
okiehsch commented 5 years ago

My reply "Yes it is" meant, yes it is caused by something other than admiral. Fixed now.

jspenguin2017 commented 5 years ago

Annoyance

https://www.centurylife.org/
https://comoinvertirenbitcoin.co/
https://www.paolobruno.net/acquista-doraemon-dvd-al-miglior-prezzo-su-ebay/

Hard

https://crazytron.net/?ref=valera83
# One refresh required to trigger
https://www.scuolaon.it/liceo-giacomo-leopardi-recanati/
jspenguin2017 commented 5 years ago

Hard

https://www.firnandus.com/
http://maryewinstead.net/gallery/displayimage.php?album=983&pid=44028
http://uttarakhand.result91.com/kumaun-university/mcom-4th-sem-exam-result-2018/83959
https://blog.sovoboys.net/?p=869

Breakage

# Flagged as FAB and Admiral
https://www.adproceed.com/ad/the-zblackcard-is-the-most-powerful-prep-paid-debit-card-in-north-america/
jspenguin2017 commented 5 years ago

Hard

https://www.tedamo.de/telekommunikation/early-adopter-torsten-seiler-verstaerkt-online-konzeption/
https://www.hackintosh.download/discover/
# Comes up every time
https://philippinenmagazin.de/tag/computer/
# Need refresh & takes a while
https://www.pcwelt.de/ratgeber/So_belauschen_Hacker_Sie_ueber_Ihre_Webcam-IT-Sicherheit-7915977.html
jspenguin2017 commented 5 years ago

That's a solid half a terabyte of data analyzed.

The last 50 packages only yielded 14 true positives. Considering that my server needs about 16 minutes to process one package once it run out of CPU credits, I need to look into potential optimizations...

okiehsch commented 5 years ago
https://www.adproceed.com/ad/the-zblackcard-is-the-most-powerful-prep-paid-debit-card-in-north-america/

Well, it is neither FAB or admiral, looks like anOptions to me.

jspenguin2017 commented 5 years ago

Well, I guess they tried all of them and got captured in the corpus.

okiehsch commented 5 years ago

What get's flagged at pcwelt.de? I can't reproduce any anti-adblock message, I followed your steps.

jspenguin2017 commented 5 years ago

It was Admiral. Let me check.

jspenguin2017 commented 5 years ago

Screenshot:

![image](https://user-images.githubusercontent.com/7283682/49691242-09072a80-fafb-11e8-8002-40e9701e89f6.png)
okiehsch commented 5 years ago

Ok, I was testing in incognito mode, if I use a normal session I can reproduce, but it is certainly not admiral and does not look like any generic solution at all.

jspenguin2017 commented 5 years ago

||tinypass.com^ works, but I'm not sure if it breaks anything. It was flagged as Admiral, but it's not Admiral anymore. Looks like Piano: https://piano.io/.

mapx- commented 5 years ago

@@||pcwelt.de/js/advert.js$script,first-party like macwelt.de

okiehsch commented 5 years ago

Yes that works, should add that to the existing pcwelt.de##+js(addEventListener-defuser.js, load, uabp) filter.

jspenguin2017 commented 5 years ago

I think I just need a bigger server. Parsing metadata sounds too complicated and may not really help.

Update:

![image](https://user-images.githubusercontent.com/7283682/49843722-61e5f580-fd7d-11e8-99a9-a83c12589751.png)
jspenguin2017 commented 5 years ago

Alright, I finished scanning 1000 packages, or about 4 TB of data. It costed about 2 and half dollars, so no, the current pure brute-force approach isn't going to work. I need to spend some time to properly design a cost-efficient scanning engine that can take advantage of multiple CPU cores and spot instances.

![image](https://user-images.githubusercontent.com/7283682/49898834-462f2d80-fe17-11e8-9263-eaf6edcba22d.png)
jspenguin2017 commented 5 years ago

Hard (BAB)

http://animated-pictures.net/
http://animesaprevodom.com/load/yugioh_5d_39_s/64
https://animor.tv/chuukan-kanriroku-tonegawa-009/
http://benoistbrasil.com/2018/01/27/waco-legenda-do-episodio-1x01-visions-and-omens/
https://bestnewcarsreview.com/2018-renault-alaskan-review-design-engine-release-date-photos/
jspenguin2017 commented 5 years ago

Hard (BAB)

http://daniel-radcliffe.org/
http://www.desmoriders.it/forum.php
http://dl.r3sub.com/
https://entertainment.dailynews.us.com/2018/09/i-land-with-kate-bosworth-among-new.html
# Takes a while + a few refreshes
http://www.birminghamforum.co.uk/index.php?topic=14979.msg664739
jspenguin2017 commented 5 years ago

Hard (BAB)

https://www.extrem-bodybuilding.de
https://gamblersfever.net/want-to-win-some-nba-bets/
http://gerisource.fansite.gallery/gallery/
http://ghannelius.org/Gallery/displayimage.php?album=615&pid=24978
https://www.hardwarezone.it/
jspenguin2017 commented 5 years ago

Hard (BAB)

https://www.hetverschiltussen.nl/verschil-iphone-smartphone/
http://www.hiringpinas.com/2018/04/tech-iii-operations-general-assembly.html
https://hulkpop.com/myle-d-in-the-bus-single/
https://malawi24.com/2018/09/11/mcp-aspirants-want-mec-to-conduct-primary-elections/?shared=email&msg=fail
# Takes a while + a few refreshes
https://hypenews.net/7-familias-que-viveram-verdadeiros-pesadelos-paranormais-em-suas-casas/
jspenguin2017 commented 5 years ago

Hard (BAB)

http://leyendas-de-occidente.blogspot.com
http://montserrat.qtellads.com/0/posts/11-China-Manufacturers-/422--Shoes-and-Accessories/
https://muktosoftware.blogspot.com/search/label/Converter
# Takes a while + a few refreshes
https://myh1z1.de/tagged/7-membersuche/?objectType=com.woltlab.wbb.thread&s=8bca277d3ad9a7b41b612385b44222f9ac03958d
# NSFW, takes a while
https://kfake.pw/2018/11/04/
jspenguin2017 commented 5 years ago

Hard (BAB)

http://www.nydailyquote.com/
https://www.poemocean.com/poem/ai-hindi-tu-chinta-na-kar-11013.html
http://www.pimpthatsnack.com/
https://www.similarminds.com/
https://sims4marigold.blogspot.com/2016/09/goddess-dress.html
jspenguin2017 commented 5 years ago

Hard (BAB)

http://sustainable.onbeon.com/2010/11/opportunities-challenges-for.html
http://takebtc.faucethero.com/?r=1N4NRtNg8opzrG9ByJg3rCUcEGdgF7bbrr
http://tedidev.com/tag/application/page/2/
http://terror-en-el-cine.blogspot.com/2013/11/
https://tmbw.ru/nataliya-gulkina-bliny-eto-klassika
jspenguin2017 commented 5 years ago

Hard (BAB)

http://www.xoox.co.il/clip/show_media.php?id=245
https://www.yoututosjeff.es/2018/07/crear-blog-gratis-blogger.html
# Takes a while
http://topdisegnidacolorare.biz/stampa/immagine/54-sagome-di-fiori-da-colorare-e-ritagliare-per-bambini-entro-fiori-da-colorare-per-bambini/
# Takes a while
http://wyklady.org/news/613_brakuje-ci-notatek-znamy-strone-ktora-ci-pomoze.html

Only 39 BAB from this scan...

okiehsch commented 5 years ago

https://www.yoututosjeff.es/2018/07/crear-blog-gratis-blogger.html is fixed in the regional list.

jspenguin2017 commented 5 years ago

OK, added to whitelist.

jspenguin2017 commented 5 years ago

Annoyance

https://80beyond.spacestation-online.com/

Hard

https://www.foguinhogames.net/
http://www.italianhotspot.com/
# Kind of hard? 
https://m5g.it
Edit: agreed
# Comes up every time
http://www.na-sportowo.com/archiwa/7216
jspenguin2017 commented 5 years ago

Hard

https://oportaln10.com.br/motorista-e-preso-no-rn-ao-dirigir-bebado-uma-carreta-de-combustiveis-81035/
https://www.pedroinnecco.com/projects/redmonder/
https://publicxxxagent.com/   [NSFW]
http://safehomefarm.com/basement-bar-ideas/

9 Admiral.

jspenguin2017 commented 5 years ago

Here's a weird one: https://allsituspokerqq.blogspot.com/

The page has a bug so it goes into an infinite redirect loop, allsituspokerqq.blogspot.com##+js(abort-on-property-read.js, blog) fixes it. I assume that for some geolocation it won't get stuck in a loop?

With the rule above, the page has BAB + popup.

jspenguin2017 commented 5 years ago

That's everything except anOptions and FAB, which are always super painful to deal with. I'll leave them for another day.

okiehsch commented 5 years ago

Btw, thanks for all the work.

jspenguin2017 commented 5 years ago

Thanks for your work too, and many thanks to Common Crawl for the amazing datasets that are provided to everyone for free!

(Honestly most of the work on my side are done by automation scripts and Common Crawl)

jspenguin2017 commented 5 years ago

I'm not sure if you want allsituspokerqq.blogspot.com##+js(abort-on-property-read.js, blog) in the filter list though, if the author doesn't know how to code, it's not our problem.

I would assume that the website works for the author and people living in the same country as the author, he just didn't notice that it breaks for everyone else.

AFAIK Google removed NCR (no country redirect) from Blogger / blogspot.com.

okiehsch commented 5 years ago

I take your point and I would not add the filter if the broken site was the only issue, but having added a filter for the anti-adblock and popup issue it seems fair to make the site usable for everyone.

jspenguin2017 commented 5 years ago

Hard (anOptions)

https://www.artemia.org/
# Comes up every time
https://avenir.ro/tag/captcha/

Annoyance (anOptions)

https://www.bukmacherzy.biz/40-pln-za-zaproszenie-znajomego-w-expekt/
# Need to refresh once
http://www.beingmanan.com/wp/2011/07/unified-services-microsoft-vs-apple/
# Need to refresh once
https://www.boulderguru.de/calendar/tag_ids~247/
jspenguin2017 commented 5 years ago

Annoyance (anOptions)

http://www.chillitorun.pl/tag/dragon-ball/
http://www.constiintacolectiva.ro/tag/dolores-cannon
https://www.creaciondempresas.es/directorio-asesorias/granada/granada/consultores-fiscalescontables-y-financieros-s-l/
http://danskengelskordbog.dk/author/admin/
https://dariusz.wieckiewicz.org/kontakt/
jspenguin2017 commented 5 years ago

Hard (anOptions)

http://www.dpmotoservice.eu/link/
# Need to refresh once
https://www.fitnesshealtharticles.com/

Annoyance (anOptions)

http://detranrj.detran-br.com/
https://filmesonlinex.site/episodios/heathers-1x4/
http://www.enligmor.dk/opskrifter/groenne-retter/squash-taerte/
jspenguin2017 commented 5 years ago

Hard (anOptions)

https://www.basentech.it/
http://fizyoo.com/tag/kas-yirtilmasina-ne-iyi-gelir/

Annoyance (anOptions)

https://eclips.ml/uncategorized/nurses-under-investigation-after-obscene-photos-with-newborns-go-viral/
https://www.historyofroyalwomen.com/
http://hollywoodredux.com/tag/luke-skywalker/