pirate / sites-using-cloudflare

:broken_heart: Archived list of domains using Cloudflare DNS at the time of the CloudBleed announcement.
1.92k stars 320 forks source link

Domains in list that might never have used Cloudflare #157

Open Zenexer opened 7 years ago

Zenexer commented 7 years ago

8053a438 introduced at least one domain, zoho.com, that might not have any ties to Cloudflare (#83). This commit should be reviewed.

@pirate Do you happen to remember how those domains were found?

cc @deepsk79

pirate commented 7 years ago

These were copied from reports found in the original HN thread.

Zenexer commented 7 years ago

Ah, figures. I suppose we'll have to check many of those manually.

pirate commented 7 years ago

e.g. https://news.ycombinator.com/item?id=13720208

Yeah, some manual checking is a good idea.

Zenexer commented 7 years ago

Domains in 8053a43 using the CF proxy:

Edit: Updated at 2017-02-25 04:41 UTC to indicate that fitbit.com does actually use the CF proxy.

Zenexer commented 7 years ago

StackOverflow was already removed per #21 and zoho.com per #83. That just leaves fitbit.com.

I'm going to leave this issue open, as we should review related commits.

pirate commented 7 years ago

@Zenexer I believe Fitbit data appeared in several search engine caches, I'm not what domain it was from though, probably a subdomain other than fitbit.com

abalabahaha commented 7 years ago

See #158, both www.fitbit.com and api.fitbit.com are under CF, just not fitbit.com

pirate commented 7 years ago

Let's add back those domains then @Zenexer.

Zenexer commented 7 years ago

@pirate That merge was closed. The other two have both confirmed they weren't using Cloudflare at the time.

pirate commented 7 years ago

@Zenexer sorry I don't follow, which other two domains? zoho & SO, or fitbit domains?

Zenexer commented 7 years ago

@pirate

I checked zoho.com thoroughly enough to be confident that there weren't any blatant errors like in #158.

deepsk79 commented 7 years ago

Will it be removed from the master list?

JedrzejMajko commented 7 years ago

This request (and list in overall) is fundamentally flawed. Big websites are attached via dns services that allow them to localize traffic. Stackoverflow, github, ovh etc all use(d) cloudflare services to battle ddos attacks. First two were using CF no longer than two weeks ago in some regions.

Not only it's ground for defamation (lawyers! authors info is here: https://github.com/pirate! ;)), but also assumes that your location is one that have all the correct DNS.

Phineas commented 7 years ago

@Coobers This is in no way illegal, it's just a list of all domains using Cloudflare - and people can remove their domains if they prove it wasn't going through the proxy or if it hosts only static content. There are already thousands of sites that scan huge hosts like Cloudflare and find all domains associated, this isn't really new.

This repo is merely to inform people about the whole Cloudflare bug & what sites might have been affected.

JedrzejMajko commented 7 years ago

@Phineas I know you think that, but consider this. This approach allows us to create list of "possible" sexual offenders. You can put there anybody based on github avatar color. From clearly methodological point of view such list would be flawed. Basis here is the same. Methodological flaw leaves this list without merit other than defamation.

Regarding removal, if it was done via website - yes, but here this information is stored forever, so it's not really removed.

There's so much wrong here.

Phineas commented 7 years ago

@Coobers The repo is called "sites-using-cloudflare".. It's also said so many times in the README that not all websites have used the proxy, and also - we're not creating a list of "possible sexual offenders", lmao, we're creating a list of websites that could've been affected by Cloudbleed.

JedrzejMajko commented 7 years ago

@Phineas Please understand it was an example, grounded to show you that it doesn't matter if you do it in IT or in simple terms, end cause is the same.

coderobe commented 7 years ago

@Coobers the readme explains that this repo does contain unverified domains that could've possibly been affected.

JedrzejMajko commented 7 years ago

@coderobe It is vaguely doing that and reversing it later on. You have follow up in #172

pirate commented 7 years ago

@Coobers I'm unclear as to where you think it's reversing it later on. We try to be very explicit in the README, and honestly not much more is needed than the methodology section, as people can read that and come to their own conclusions as to the accuracy of the list. Anyway, I've changed the header I think you might be referencing: https://github.com/pirate/sites-using-cloudflare/pull/179

Zenexer commented 7 years ago

@Coobers To clarify, this issue was just to address the possibility that there could have been a mistake--not that there actually was a mistake. Ultimately, no action was taken a result of this issue.

When we're dealing with millions of entries in a dataset, errors are going to be inevitable, no matter how meticulous we are. This is why it's important for us to double-check if there's even a slight suspicion that a mistake could've been made.

It does appear there were initially two domains in the README that weren't using Cloudflare at the time (though at least one of them was in the past). However, from what I can tell, the list didn't serve the same purpose back then; it's come a long way since. Initially, it was just a list of sites that had been mentioned on social media as potentially worth looking into. They were looked into and subsequently removed from the list.