Closed jspenguin2017 closed 5 years ago
RIP...
Annoyance
http://www.noodletowntranslated.com/global-evolution/global-evolution-chapter-45/?replytocom=9108
I thought your server was down.
Hard (BAB)
http://www.baumblaetter.de/
https://furkankhan.000webhostapp.com/bookmarking/design-build-in-manhattan-apartment-renovations-nyc-design-build-firm-in-nyc.html
http://gameinfo.pw/pendapatan-grand-theft-auto-v-kalahkan-penjualan-film-avatar/
Well, not down enough I guess... 1.7k backlog again, I need a lambda to trim nonsense out of the raw scanner output.
It was a tough start, but it'll get better, lots of the result now are duplicates.
Hard
http://www.classifiedscalgary.ca/ads/calgary-toronto-real-estate-ad-700854/
http://www.metropolisweb.it/metropolisweb/2018/05/17/scavi-pompei-torna-alla-luce-vicolo-dei-balconi/
https://www.mocasoft.ro/
Annoyance
http://mywrestling.com.pl/mycast-6-special-qa-with-shadow-juz-20-lipca-kanale-mywrestling/
Hard
http://www.yafud.pl/14632/
# Comes up every time
https://philippinenmagazin.de/2018/11/07/weiterer-menschenrechtsanwalt-auf-den-philippinen-getoetet/
# Comes up every time
https://silvertails.net/threads/classic-call.2900/
Hard
https://www.upi.com/Top_News/US/2018/06/21/House-passes-farm-bill-tightening-work-requirements-for-SNAP/9081529619198/?spt=rrs&or=5
Annoyance
https://webwereld.nl/e-commerce/4478-einde-news-nl-is-niet-het-einde-van-de-scanpen
https://webwereld.nl/e-commerce/4478-einde-news-nl-is-niet-het-einde-van-de-scanpen
What kind of generic anti-adblock solution got flagged on that site?
Admiral. Although I think it's something else that's causing it.
Yes it is, I can see a request to an admiral domain, but that is blocked by Peter Lowe's list and even if I allow it there is no anti-adblock message.
I see a red banner:
My reply "Yes it is" meant, yes it is caused by something other than admiral. Fixed now.
Annoyance
https://www.centurylife.org/
https://comoinvertirenbitcoin.co/
https://www.paolobruno.net/acquista-doraemon-dvd-al-miglior-prezzo-su-ebay/
Hard
https://crazytron.net/?ref=valera83
# One refresh required to trigger
https://www.scuolaon.it/liceo-giacomo-leopardi-recanati/
Hard
https://www.firnandus.com/
http://maryewinstead.net/gallery/displayimage.php?album=983&pid=44028
http://uttarakhand.result91.com/kumaun-university/mcom-4th-sem-exam-result-2018/83959
https://blog.sovoboys.net/?p=869
Breakage
# Flagged as FAB and Admiral
https://www.adproceed.com/ad/the-zblackcard-is-the-most-powerful-prep-paid-debit-card-in-north-america/
Hard
https://www.tedamo.de/telekommunikation/early-adopter-torsten-seiler-verstaerkt-online-konzeption/
https://www.hackintosh.download/discover/
# Comes up every time
https://philippinenmagazin.de/tag/computer/
# Need refresh & takes a while
https://www.pcwelt.de/ratgeber/So_belauschen_Hacker_Sie_ueber_Ihre_Webcam-IT-Sicherheit-7915977.html
That's a solid half a terabyte of data analyzed.
The last 50 packages only yielded 14 true positives. Considering that my server needs about 16 minutes to process one package once it run out of CPU credits, I need to look into potential optimizations...
https://www.adproceed.com/ad/the-zblackcard-is-the-most-powerful-prep-paid-debit-card-in-north-america/
Well, it is neither FAB or admiral, looks like anOptions to me.
Well, I guess they tried all of them and got captured in the corpus.
What get's flagged at pcwelt.de
? I can't reproduce any anti-adblock message, I followed your steps.
It was Admiral. Let me check.
Screenshot:
Ok, I was testing in incognito mode, if I use a normal session I can reproduce, but it is certainly not admiral and does not look like any generic solution at all.
||tinypass.com^
works, but I'm not sure if it breaks anything.
It was flagged as Admiral, but it's not Admiral anymore. Looks like Piano: https://piano.io/
.
@@||pcwelt.de/js/advert.js$script,first-party
like macwelt.de
Yes that works, should add that to the existing pcwelt.de##+js(addEventListener-defuser.js, load, uabp)
filter.
I think I just need a bigger server. Parsing metadata sounds too complicated and may not really help.
Update:
Alright, I finished scanning 1000 packages, or about 4 TB of data. It costed about 2 and half dollars, so no, the current pure brute-force approach isn't going to work. I need to spend some time to properly design a cost-efficient scanning engine that can take advantage of multiple CPU cores and spot instances.
Hard (BAB)
http://animated-pictures.net/
http://animesaprevodom.com/load/yugioh_5d_39_s/64
https://animor.tv/chuukan-kanriroku-tonegawa-009/
http://benoistbrasil.com/2018/01/27/waco-legenda-do-episodio-1x01-visions-and-omens/
https://bestnewcarsreview.com/2018-renault-alaskan-review-design-engine-release-date-photos/
Hard (BAB)
http://daniel-radcliffe.org/
http://www.desmoriders.it/forum.php
http://dl.r3sub.com/
https://entertainment.dailynews.us.com/2018/09/i-land-with-kate-bosworth-among-new.html
# Takes a while + a few refreshes
http://www.birminghamforum.co.uk/index.php?topic=14979.msg664739
Hard (BAB)
https://www.extrem-bodybuilding.de
https://gamblersfever.net/want-to-win-some-nba-bets/
http://gerisource.fansite.gallery/gallery/
http://ghannelius.org/Gallery/displayimage.php?album=615&pid=24978
https://www.hardwarezone.it/
Hard (BAB)
https://www.hetverschiltussen.nl/verschil-iphone-smartphone/
http://www.hiringpinas.com/2018/04/tech-iii-operations-general-assembly.html
https://hulkpop.com/myle-d-in-the-bus-single/
https://malawi24.com/2018/09/11/mcp-aspirants-want-mec-to-conduct-primary-elections/?shared=email&msg=fail
# Takes a while + a few refreshes
https://hypenews.net/7-familias-que-viveram-verdadeiros-pesadelos-paranormais-em-suas-casas/
Hard (BAB)
http://leyendas-de-occidente.blogspot.com
http://montserrat.qtellads.com/0/posts/11-China-Manufacturers-/422--Shoes-and-Accessories/
https://muktosoftware.blogspot.com/search/label/Converter
# Takes a while + a few refreshes
https://myh1z1.de/tagged/7-membersuche/?objectType=com.woltlab.wbb.thread&s=8bca277d3ad9a7b41b612385b44222f9ac03958d
# NSFW, takes a while
https://kfake.pw/2018/11/04/
Hard (BAB)
http://www.nydailyquote.com/
https://www.poemocean.com/poem/ai-hindi-tu-chinta-na-kar-11013.html
http://www.pimpthatsnack.com/
https://www.similarminds.com/
https://sims4marigold.blogspot.com/2016/09/goddess-dress.html
Hard (BAB)
http://sustainable.onbeon.com/2010/11/opportunities-challenges-for.html
http://takebtc.faucethero.com/?r=1N4NRtNg8opzrG9ByJg3rCUcEGdgF7bbrr
http://tedidev.com/tag/application/page/2/
http://terror-en-el-cine.blogspot.com/2013/11/
https://tmbw.ru/nataliya-gulkina-bliny-eto-klassika
Hard (BAB)
http://www.xoox.co.il/clip/show_media.php?id=245
https://www.yoututosjeff.es/2018/07/crear-blog-gratis-blogger.html
# Takes a while
http://topdisegnidacolorare.biz/stampa/immagine/54-sagome-di-fiori-da-colorare-e-ritagliare-per-bambini-entro-fiori-da-colorare-per-bambini/
# Takes a while
http://wyklady.org/news/613_brakuje-ci-notatek-znamy-strone-ktora-ci-pomoze.html
Only 39 BAB from this scan...
https://www.yoututosjeff.es/2018/07/crear-blog-gratis-blogger.html
is fixed in the regional list.
OK, added to whitelist.
Annoyance
https://80beyond.spacestation-online.com/
Hard
https://www.foguinhogames.net/
http://www.italianhotspot.com/
# Kind of hard?
https://m5g.it
Edit: agreed
# Comes up every time
http://www.na-sportowo.com/archiwa/7216
Hard
https://oportaln10.com.br/motorista-e-preso-no-rn-ao-dirigir-bebado-uma-carreta-de-combustiveis-81035/
https://www.pedroinnecco.com/projects/redmonder/
https://publicxxxagent.com/ [NSFW]
http://safehomefarm.com/basement-bar-ideas/
9 Admiral.
Here's a weird one: https://allsituspokerqq.blogspot.com/
The page has a bug so it goes into an infinite redirect loop, allsituspokerqq.blogspot.com##+js(abort-on-property-read.js, blog)
fixes it. I assume that for some geolocation it won't get stuck in a loop?
With the rule above, the page has BAB + popup.
That's everything except anOptions and FAB, which are always super painful to deal with. I'll leave them for another day.
Btw, thanks for all the work.
Thanks for your work too, and many thanks to Common Crawl for the amazing datasets that are provided to everyone for free!
(Honestly most of the work on my side are done by automation scripts and Common Crawl)
I'm not sure if you want allsituspokerqq.blogspot.com##+js(abort-on-property-read.js, blog)
in the filter list though, if the author doesn't know how to code, it's not our problem.
I would assume that the website works for the author and people living in the same country as the author, he just didn't notice that it breaks for everyone else.
AFAIK Google removed NCR (no country redirect) from Blogger / blogspot.com
.
I take your point and I would not add the filter if the broken site was the only issue, but having added a filter for the anti-adblock and popup issue it seems fair to make the site usable for everyone.
Hard (anOptions)
https://www.artemia.org/
# Comes up every time
https://avenir.ro/tag/captcha/
Annoyance (anOptions)
https://www.bukmacherzy.biz/40-pln-za-zaproszenie-znajomego-w-expekt/
# Need to refresh once
http://www.beingmanan.com/wp/2011/07/unified-services-microsoft-vs-apple/
# Need to refresh once
https://www.boulderguru.de/calendar/tag_ids~247/
Annoyance (anOptions)
http://www.chillitorun.pl/tag/dragon-ball/
http://www.constiintacolectiva.ro/tag/dolores-cannon
https://www.creaciondempresas.es/directorio-asesorias/granada/granada/consultores-fiscalescontables-y-financieros-s-l/
http://danskengelskordbog.dk/author/admin/
https://dariusz.wieckiewicz.org/kontakt/
Hard (anOptions)
http://www.dpmotoservice.eu/link/
# Need to refresh once
https://www.fitnesshealtharticles.com/
Annoyance (anOptions)
http://detranrj.detran-br.com/
https://filmesonlinex.site/episodios/heathers-1x4/
http://www.enligmor.dk/opskrifter/groenne-retter/squash-taerte/
Hard (anOptions)
https://www.basentech.it/
http://fizyoo.com/tag/kas-yirtilmasina-ne-iyi-gelir/
Annoyance (anOptions)
https://eclips.ml/uncategorized/nurses-under-investigation-after-obscene-photos-with-newborns-go-viral/
https://www.historyofroyalwomen.com/
http://hollywoodredux.com/tag/luke-skywalker/
20 packages scanned, 1 package processed, about 50,000+ packages expected per month. About 80 GB data scanned out of about 200+ TB expected per month.