Open b-shrp opened 1 month ago
Hi there @b-shrp , thanks for opening the issue.
The proxy/CDN issue is a a known one, and this partly result of the original design of the green check service assuming that the origin IP is okay to disclose.
Lots of people who use CDNs use them to avoid disclosing this IP, and the check has largely based around doing a DNS look up at present.
There is a new project we are working on to address some of the problems around basing this on resolving an IP address from a domain name, and it's called carbon.txt. It works directly on domains, so in the case of https://solar.dries, we'd do a lookup for information about the domain itself. It's something still in development and we have not prioritised self hosted solar websites like this, although it's something we would like to do in future. I've linked to a relatively long report detailing how we intend to build this over the next six months, as well as the current project website:
https://carbontxt.org https://www.thegreenwebfoundation.org/publications/carbon-txt-energy-efficiency-briefing/
That said, I don't think we had considered sniffing for cloudflare-specific HTTP status headers before.
I don't really know enough about them yet to know if they are a good fit for this specific use case, and to be honest, I'm a little bit wary about something so vendor specific, but I think I understand directionally what you're suggesting.
We know there are a number of large CDNS, like Amazon Cloudfront, Fastly CDN, bunny.net, Google's Cloud CDN, and so on.
Do you know if there is any central repository listing these headers that might make it easier to parse them, and draw some kind of conclusion about them?
I can totally see the value in being able to flag up when a CDN is in use, because much more of the web now relies on CDNs than when this was initially built. I should tell you now though that we don't really have any plans to prioritise any work on the Green check API this year.
It would be helpful to use the thread to gather some notes for when we might revisit this because I think what you're suggesting is actually quite a helpful feature.
Is your feature request related to a problem? Please describe.
I've noticed that sites using Cloudflare's DNS proxy show that the host is Cloudflare and therefore green. If the site was using the CDN and caching then that seems to make sense, but if the site is using only the proxy, it seems this could be misused (intentionally or not). The example site in question (https://solar.dri.es/) is hosted from a home server (on solar power) and only using the proxy for ddos protection. While indeed green, I would expect the result to be unknown. My understanding is that the checker only looks at the IP the uri resolves to.
While DNS proxy services are available from a range of service providers, cloudflare's free offering is quite popular and perhaps one to address.
I would like to have confidence that the checker is evaluating where the data is primarily being served from, and if it can't, that it is able to give some indication of being unknown.
Describe the solution you'd like
I would like the checker to have awareness of when a proxy is being used and pay attention to other indicators as to where to data is being served from. Eg. if a header has something like this in it
cf-cache-status: DYNAMIC
then the checker might return a result like:Unknown: Host behind DNS Proxy
.Additional context
I recognize this is likely easier said than done, but I'd be curious to know more about the obstacles and if decisions have been made to not include this. I presume some of the reasoning may be that in fact the usefulness of this checker to climate action is in it's ability to highlight the green-ness of big hosting platforms where the scale matters. That said, it does speak to it as a source of truth to be able to identify where it may really be hosted.