php / doc-base

Tools for the PHP documentation
Other
343 stars 85 forks source link

Update HTTP URLs to HTTPS where appropriate #42

Open fulldecent opened 3 years ago

fulldecent commented 3 years ago

This upgrades all HTTP URLs in the repo to be HTTPS.

This is NOT a blind copy/paste. I have reviewed each one of these.

Process

  1. Get unsafe URLS:

    grep -r http:// . | pcregrep -o 'http://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?' | sort -u > urls\n
  2. Delete w3.org, docbook.org, */ns/* URLs, those are namespace URLs

  3. Propose replacements only if upgraded HTTPS URL works:

    cat urls | ruby ~/Desktop/tmp/replacer.rb
  4. Review each one and revert changes if touching licenses, namespaces or if it doesn't make sense

kamil-tekiela commented 3 years ago

May I ask why? Was it causing problems? Were the HTTP links not pointing to the right resource?

fulldecent commented 3 years ago

Of course. Accessing a link over HTTP protocol allows your ISP, government and anybody on your network to monitor and modify your communications. Therefore is is generally best practice to use HTTPS everywhere if possible.

Some discussion at: https://www.eff.org/https-everywhere

I can't find a "mission statement" for the PHP project, but I'll have to assume personal privacy and correctness of data accessed is compatible with the project goals and therefor can make this PR in-scope for the project.

kamil-tekiela commented 3 years ago

That's not what I meant. I know what HTTPS is for, but my question was rather very simple. Do the links we have in the manual, which point towards HTTP resources, still work properly? If so, do they redirect to HTTPS variants automatically? We can consider fixing the ones that point to 404 or 403, or we might even consider fixing ones that don't redirect automatically.

Accessing a link over HTTP protocol allows your ISP, government and anybody on your network to monitor and modify your communications. Therefore is is generally best practice to use HTTPS everywhere if possible.

That is very much true, but this should be enforced by internet browsers and individual service providers (i.e. each website provider should ensure to redirect HTTP traffic to HTTPS using 301 code). It is not our task to enforce this. I don't even see any benefit to it if all the links still work.

You are asking us to change a lot of files in the PHP manual, fixing a problem that might not exist. I checked a couple of URLs from this PR and they redirect correctly to their HTTPS counterparts. So once again I am asking, why should we make this change?

fulldecent commented 3 years ago

Ah, understood.

First, publishing a link to an HTTP schema, even if it redirects, is just as bad as the entire request being delivered over HTTP. This is because HTTPs requests can be monitored, intercepted, and forged. This happens very frequently and is well published.

It can be argued whether a browser that is specifically requested to load a resource over HTTP should abide or should instead use HTTPs. But practically, this is not something that is happening with major browsers today by default.

I think I have established that HTTP links cause a privacy and data accuracy problem. But of course I have failed to say whether this actually is a problem, because we can't say whether the PHP project cares about privacy or data accuracy. (Mission statements actually are important.)

But practically, there are probably very few of these sites that are inaccessible as-is, which I believe is your main question. I have not tested those. Since my initiative here is privacy, not functionality.

salathe commented 3 years ago

Please don't change XML namespace URIs. I'm also not keen on changing license/copyright blocks from third-parties. Finally, I know you mean well but, please discuss this sort of thing with the documentation team before wasting time on it.

fulldecent commented 3 years ago

@salathe Thank you! I do understand that when sending a PR from the hip that I take fully responsibility that I am wasting my time.

I have repeated the process with a much smaller scope. This EXCLUDES third-party licenses and is much easier to review. It is also rebased on top of master and is a single commit.

Scope creep:


@kamil-tekiela HTTP is deprecated. And HTTPS is wholly better. Every time we don't update a HTTP URL we are supporting oppressive regimes that spy on people's internet traffic.

This is something I care about and so I have created that tool and I brought forward this PR.