turbobytes / cdnfinder

Webapp and cli-tool to detect CDN usage of websites.
http://www.cdnplanet.com/tools/cdnfinder/
MIT License
111 stars 27 forks source link

fonts.googleapis.com #16

Closed hypesystem closed 4 years ago

hypesystem commented 4 years ago

Hi!

Thanks for a great project :smile:

We are using this project, and found that fonts.googleapis.com isn't listed as a CDN. Would it qualify under the requirements you set for CDNs (I'm not clear on what these are)? It is used for including Google Fonts on websites, and seems to me that it should be listed?

If the answer is yes, I am happy to make a PR to include it :smile:

Best, Niels

aaronpeters commented 4 years ago

Thanks Niels.

Many consider fonts.googleapis.com to be a hostname that serves content from Google's CDN. CDN here means: servers in multiple geographical locations are distributing the content.

We should flag this as a CDN too, considering we do this for other Google hostnames too, see https://github.com/turbobytes/cdnfinder/blob/6a35658ca6af6e5d02af3fd8a7d67d43a7b69178/assets/cnamechain.json .

That list needs some cleaning up...

I also believe we should have a good, strict definition of CDN. I lean towards 'CDN providers, being companies that have a service you can use to distribute your content from multiple geo locations'. Does Youtube fit in this definition? If yes, then Dailymotion and Vimeo too? All three are not really CDNs, right? Still contemplating on this ... your view is appreciated

hypesystem commented 4 years ago

Hi Aaron,

Yeah, this was my thought. I don't see a use case for cdnfinder where you wouldn't want all entries to CDNs (including less common or use-case specific hostnames) included as CDNs. That is: I don't see a case where anyone using the tool would be annoyed or inconvenienced by fonts.googleapis.com being included as a CDN-hostname, seeing as it is for all intents and purposes serving from geographically distributed servers.

I honestly think that the "company" part of your CDN definition is irrelevant. It's about the network and how it is set up. So I would focus just on hostnames that point to or IPs that are part of such geographically distributed networks used to distribute content. The idea being that the same content is available from many differently geographically situated servers behind the same hostname.

aaronpeters commented 4 years ago

Thanks Niels.

My view today on the CDN Finder tool (the full site/website version) : The CDN Finder tool should answer the question "Which CDN is website X using to serve their content?" E.g. "Which CDN is https://www.ebay.com using?" Results per today: https://www.cdnplanet.com/tools/cdnfinder/#site:https://www.ebay.com

Ebay loads a number of third party scripts/trackers/analytics/ads on the page, but those can be discarded in answering the question at hand. The challenge is this is: how differentiate between first party and third party content? One way to do this is by counting the # resources per hostname and this is what we use today in CDN Finder. The underlying assumption is that the hostname with the highest count is used for first party content (in the example of Ebay that hostname is ir.ebaystatic.com). This works well for most websites, afaik.

In my opinion, it is a nice-to-have to see which CDNs are being used by the third parties.

Your view seems to differ from mine. You'd like to know if all/most content (first party or not) is served from a CDN, where CDN means 'the same content is available from many differently geographically situated servers behind the same hostname'. Quoting you:

I don't see a case where anyone using the tool would be annoyed or inconvenienced by fonts.googleapis.com being included as a CDN-hostname

I agree to this: no user of the tool would be annoyed by this and it's not a big effort on our part to infrequently update the list of known CDNs with the goal being to have good coverage of the known 'big boys', like Google Fonts and unpkg.com. That said, in context of my view on the question CDN Finder should answer, I consider this 'bonus' and not required for the service to be valuable.

If you have more to share (views, ideas, ...), please do.

hypesystem commented 4 years ago

Ah, I see. Yeah, that makes sense :smile: (and you are accurately representing what I'm saying)

I agree that this isn't necessary for the CDN-finder to be valuable at all, but I guess it depends on the use case. I see two different types of users:

  1. Wanting to see what CDN is used by some other company. That's the case you are describing, and it is indeed just a bonus to include third-party CDNs.

  2. Wanting to ensure that one's own site serves all assets from a CDN. In this case, it would be nice to see which assets are and are not served from CDNs (third-party or not), as all first- or third-party resources could be moved to a CDN as a way of speeding up site load. For example, my company's website has a lot of stuff hosted directly on the site itself, rather than from a CDN, which points to a problem that can be solved to improve site load (see image).

image

I'm not sure whether or not group 2 are intended audience for the tool at all, so I can't evaluate how important it is to support the use case. It is why I personally would use CDNfinder, however, so I guess that's why my perspective is as it is :smile:

hypesystem commented 4 years ago

As an aside (which maybe should be a new issue, but we're having a more general discussion now, so I'm going to put it here), our website, https://deranged.dk, is hosted by Netlify. Accoding to themselves, their hosting pretty much fits the definition I am touting for a CDN (see https://www.netlify.com/products/edge/).

For my use case (user group 2) I would find value in seeing that the 18 counts of items hosted on deranged.dk are in fact using Netlify as CDN (so the first row in the image above would say "Netlify" in the CDN-column).

First off, I guess this is relatively hard to reliably spot. The best I've found is the server response-header which Netlify sets to server: Netlify.

Secondly, I'm guessing this is not necessarily of value to user group 1? But then again... maybe? I have a hard time telling with this one :smile:

aaronpeters commented 4 years ago

Wanting to ensure that one's own site serves all assets from a CDN. In this case, it would be nice to see which assets are and are not served from CDNs (third-party or not), as all first- or third-party resources could be moved to a CDN as a way of speeding up site load. For example, my company's website has a lot of stuff hosted directly on the site itself, rather than from a CDN, which points to a problem that can be solved to improve site load (see image

I can foresee that some website owners will use CDN Finder to answer the question "which resources on my site are (not) served from a CDN?". The thinking here is: the more resources are served from CDN, the faster my page loads. That thinking is somewhat valid but it's not always the case that e.g. loading fonts from Google Fonts results in faster pages compared to serving the WOFF(2) files from your own domain (because of the extra dns lookup, tcp/tls connect, ...). It's actually a performance best practice to self-host as much of the page resources as you can + use a CDN (with HTTP/2 and good H/2 prioritisation), so if you're using a CDN already, it's best to serve the font files self and not use Google Fonts.

Take a look at https://www.cdnplanet.com/tools/cdnfinder/#site:https://www.24kitchen.nl/ They are not using a CDN on their domain (at least some people who work there will know this) and they have may third party stuff on the page. From looking at the CDN Finder results they can see that many of the third party resources come from a CDN, so 'that is probably good for performance'. What about the resources that our tool marks as 'not using a CDN'? How much faster would their pages load if those were on a CDN? No way for CDN Finder to help answer this question. Depends on many factors, including:

Anyway ... our focus is and will be on group 1 and providing a good service to group 2 is nice to have.

Regarding your Netlify example: we should detect Netlify as a CDN (I'm very familiar with their service). Too bad their HTTP/2 implementation is poor though, but that's a different topic ;-)

Thanks for this conversation!

hypesystem commented 4 years ago

Thanks! Very enlightening, that makes sense.

Can you help me figure out what is happening with https://deranged.dk, then? It uses Netlify, but not an explicit CDN domain (they used to do this, but I think they are hiding the CDN behind the domain name now). It's not registered as using Netlify but as deranged.dk itself (even though that, in this case, is exactly the same: deranged.dk points to the Netlify CDN).

aaronpeters commented 4 years ago

Simple: we are not detecting Netlify as a CDN right now, and this should be fixed on our end.

hypesystem commented 4 years ago

Cool! If you can point me to some lines of code to start at, I might be able to give it a go? At least adding the googlefonts CDN should be doable even without knowing the code base :smile:

aaronpeters commented 4 years ago

The code is easy to update, see https://github.com/turbobytes/cdnfinder/blob/master/headerguess.go

I will get it done soon for Google Fonts and Netlify.