ocaml / infrastructure

WIki to hold the information about the machine resources available to OCaml.org
40 stars 9 forks source link

Move ocaml.org to alternate hosting #13

Closed mtelvers closed 1 year ago

mtelvers commented 1 year ago

Equinix is closing the data centre in Amsterdam, which currently hosts www.ocaml.org and staging.ocaml.org. These websites need to be moved elsewhere.

In the short term, I propose moving these to Equinix Dallas, running on ARM hardware.

To facilitate this move, please can these DNS records be updated:

145.40.81.195 v3a.ocaml.org, v3.ocaml.org, ocaml.org, www.ocaml.org
147.75.47.79 staging.ocaml.org

Test deployments are currently visible on these domains v3a.ocamllabs.io, staging.ocamllabs.io pointing to those IP addresses.

avsm commented 1 year ago

If I'm not mistaken, these machines are only hosting the website, and not doing the actual documentation bulk building (which takes a lot of CPU, and is run on the ocaml-ci cluster now).

If that's right, the most efficient place to put these is a VM in the Scaleway cluster (which they kindly give us free credits for, and it has been pretty reliable these past two years). Please confirm and I'll create the two VMs on Scaleway (v3b.ocaml.org and v3c.ocaml.org, which we can then migrate to the v3/ocaml.org/staging variants once they are setup and running).

mtelvers commented 1 year ago

Yes, that's right, I agree entirely, I was using the resources I had. These websites can easily be accommodated on a VM, and a free credit VM sounds ideal. I note that they have a relatively large RAM requirement. Initially ~1.5GB and then settling to around 1GB. Either x86 or ARM would be fine. If you set up the VMs, I'll deploy the Docker containers, and then you can update the DNS.

avsm commented 1 year ago

Sounds good. Given these are the main ocaml website, I'll overprovision the RAM to deal with load spikes.

avsm commented 1 year ago

v3b.ocaml.org set up with a high bandwidth pipe, and v3c.ocaml.org also for the staging (less cpu, less bandwidth, same storage). Your github ed25519 key is on both, and sdb is the block device on both for setting up for /var/lib/docker. Let me know when it's set up and then we can move DNS once stable.

mtelvers commented 1 year ago

The new websites, v3b.ocaml.org and v3c.ocaml.org for live and staging respectively, were brought online on Saturday and had been stable over the weekend. There were some changes needed to support IPv6 correctly. These are fully documented http://infra.ocaml.org/www-ocaml-org. The DNS can be changed whenever you feel sufficient parallel running has happened. The sites cannot generate SSL certificates for the actual names until the DNS switchover has occurred.

avsm commented 1 year ago

Fantastic change documentation, thanks! I'll swap the DNS around tomorrow.

avsm commented 1 year ago

Old records:

www A 147.75.80.123
staging A 51.159.79.64

new records point to www/@.ocaml.org to the v3b A and AAAA records, and staging to v3c.

Holding this open to test the stability of the new servers, and to garbage collect the old machines (is the old staging VM on scaleway?)

avsm commented 1 year ago

I've rolled back the DNS changes as it appears the new machines aren't configured to accept ocaml.org connections. Do we need to coordinate a certificate renewal?

% curl -vv https://www.ocaml.org
*   Trying 51.159.83.169:443...
* Connected to www.ocaml.org (51.159.83.169) port 443 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (OUT), TLS handshake, Client hello (1):
* error:1404B438:SSL routines:ST_CONNECT:tlsv1 alert internal error
* Closing connection 0
curl: (35) error:1404B438:SSL routines:ST_CONNECT:tlsv1 alert internal error
avsm commented 1 year ago

In fact, @mtelvers observed this:

The sites cannot generate SSL certificates for the actual names until the DNS switchover has occurred.

What needs to happen to the servers to prod them to regenerate their certificates, once the DNS switchover has happened? Do I have access to this via the deploy.ci.ocaml.org site?

mtelvers commented 1 year ago

The server is trying to generate the certificates but because the DNS changes haven't propagated to the ACME provider's DNS servers they aren't able to resolve the challenge.

avsm commented 1 year ago

Do the v3 servers dynamically generate a cert every time they start up, or persist them on the filesystem? It would be good to just stick in the ocaml.org ones from the existing machines in their persistent store on the new ones. This avoids any downtime.

mtelvers commented 1 year ago

The certificates persist on the filesystem. I'm seeing certificates now for OCaml.org and www.ocaml.org as of 16:12.

avsm commented 1 year ago

Ok, I've migrated it back to the new host (but sans IPv6 for now). There was definitely a blip on the migration when I did it a second time around, and then the server started responding. I wonder if the issue is IPv6 (which I haven't activated yet the second time around).

mtelvers commented 1 year ago

staging.ocaml.org was at Scaleway previously so that machine can now be deleted. In this final iteration of this project, we probably didn't need to move it. However, the new machine is a better size and has IPv6. The DNS/certificates seems to have worked correctly on that one.

mtelvers commented 1 year ago

I have updated https://deploy.ci.ocaml.org removing the original website deployments, docker contexts etc. I added this commit to PR#148.

mtelvers commented 1 year ago

We are still seeing HTTP requests to the original server. Please can we update the DNS for v3.ocaml.org and v3a.ocaml.org to point to the new IP address?

v3.ocaml.org: 147.75.80.123
v3a.ocaml.org: 147.75.80.123
ocaml.org: 51.159.83.169
www.ocaml.org: 51.159.83.169
v3b.ocaml.org: 51.159.83.169
avsm commented 1 year ago

Have moved v3.ocaml.org CNAME over to v3b. The v3a name wasn't a public one, so I'll remove that once the server itself is decommissioned.

mtelvers commented 1 year ago

With the change of the v3 entry, the hits against the old server have reduced to a trickle. If you are happy, we can proceed to the final stage and delete the server from Equinix.

avsm commented 1 year ago

I think v3a.ocaml.org might just have been 'sent to the glue farm' by Equinix already, as I can't find the v3a machine any more and it's not responding to ping. I'll remove the v3a.ocaml.org alias and call this done, if you can confirm you also can't see it, @mtelvers.

mtelvers commented 1 year ago

Yes, it's gone. Please remove the DNS record for v3a and then this issue can be closed.

avsm commented 1 year ago

Farewell v3a, you served us well! Now removed.