Closed stephenmcgruer closed 4 years ago
cc @Hexcles as an FYI
Argh, adding a new HTTPS port is much more complicated than I wanted. It relies on a Bocoup-owned literal fork-of-a-fork of a terraform config (https://github.com/bocoup/terraform-google-multi-port-managed-instance-group/blob/master/README.md), which I think will have to be changed to make this properly work. There's a chance that maybe we can just make wptserve start the port however, even if external traffic won't be allowed access...
Attempting to push an image with 856cd76
to wpt.live. I ran:
make wpt-server-tot
make wpt-server-submissions
make publish-wpt-server-tot
make publish-wpt-server-submissions
terraform apply
The added HTTPS port broke the Chromium WPT Importer and is being reverted in https://github.com/web-platform-tests/wpt/pull/24394
Ok, so adding a HTTPS port eventually worked. It seemed to take a really long time for the server to redeploy; I deployed around 7:30am EST, and the change didn't take hold till ~9:00am EST.
As such, we're actually only reverting the javascript part of the change, to unbreak Chromium. So this issue can be closed - right after we file follow-up issues for the various things we've learnt (not to use git pull
and to look into how long things take to deploy.)
We probably also need to set up new firewall rules to forward/load-balance the new port correctly, and maybe need to update the cert, too.
🤦 oh right, I forgot I had only fired it up on wptserve but hadn't done the other parts of the problem.
and maybe need to update the cert, too.
Probably not. I checked our cert and the common name and DNS names don't include port numbers.
Incidentally, this is another case where it seems reasonable to expect we should have some way to ensure changes to wpt like the one that caused this don’t get merged without first notifying the wpt.live maintainers — and also me, because the change broke w3c-test.org too
cc @web-platform-tests/wpt-core-team
We have a process for notifying such maintainers - it is the RFC process. This had an RFC - https://github.com/web-platform-tests/rfcs/pull/57, which went through the full 7 day waiting period after approval. I'm open to suggestions to improve this process, but I want to make it clear we do have a process.
Putting on my wpt.live hat, the tricky thing for me as a downstream maintainer was realizing the true impact of the change. I wrote the wptserve change, but despite that I still failed to realize that downstream configs (wpt.live, Chromium, w3c-test.org) would exist and be different, and thus cause an assert in wpt. I don't have good ideas for how to solve that better in the abstract - as long as a downstream consumer is using a live version of a project versus a pinned version, it seems likely that downstream consumer will be broken at some point. Of course for wpt.live (and w3c-test.org) using a live version is the exact point...
I've pushed a new terraform config with 8443 exposed. That will take a bit of time to apply, and then hopefully we should be good here.
This appears to not have fully worked; something like https://wpt.live:8443/origin-isolation/resources/send-origin-isolation-header.py?header=%3F1 doesn't respond.
I did just find a list of ports in the gce load balancing config which didn't include 8443, so I added it there and we'll see if that helps.
(Also updated the firewall rules in the VPC network section, to add 8443 again)
No luck so far.
gcloud compute firewall-rules list --project=wpt-live
shows 8443 as allowed.
SSH-ing into one of the tot instances and running netstat
, I see:
$ sudo netstat -plunt
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 798539/python
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 358/sshd
tcp 0 0 0.0.0.0:8443 0.0.0.0:* LISTEN 798542/python
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 798541/python
tcp 0 0 127.0.0.1:34683 0.0.0.0:* LISTEN 305/containerd
tcp 0 0 0.0.0.0:8000 0.0.0.0:* LISTEN 798540/python
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 798536/python
tcp6 0 0 :::8001 :::* LISTEN 798537/python
tcp6 0 0 :::8002 :::* LISTEN 798538/python
udp 0 0 127.0.0.53:53 0.0.0.0:* 260/systemd-resolve
udp 0 0 10.127.0.24:68 0.0.0.0:* 240/systemd-network
Weirdly the docker instances don't appear to report any ports being mapped? (But likely not relevant, given that other ports work)
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a0fa4e58f675 gcr.io/wpt-live/wpt-live-wpt-server-tot "/usr/bin/supervisord" 47 hours ago Up 47 hours klt--faah
e0b5325d65c2 gcr.io/stackdriver-agents/stackdriver-logging-agent:0.2-1.5.33-1-1 "/entrypoint.sh /usr…" 47 hours ago Up 47 hours stackdriver-logging-agent
$ docker port a0fa4e58f675
$ docker port e0b5325d65c2
$
cc @Hexcles - any ideas?
I took a look around and believe we forgot to add the new port to the load balancer: https://github.com/web-platform-tests/wpt.live/blob/6e13edaf30860b319b5a6a29f9bd437a49835da8/infrastructure/web-platform-tests/load-balancing.tf#L16
Ok, I pushed a new terraform update with that change and I think we're good now. That was 'fun', may we never add a new port again.
Reported on #testing on irc.w3.org. Conversation appears to have found multiple problems:
First thing to fix will be to teach wpt.live about the new HTTPS port. Then probably fixup the scripts to
git reset --hard
rather than pull.