Closed xunholy closed 1 year ago
nc
or netcat
against all major cloud providers network would say the port is open even if firewall if blocking them, so to actually test it you'd need to call some api
I'm not sure how this can be fixed given that it should be something in the GCP CAPI infrastructure provider to open these ports, it might even be available in infrastructure manifests. CACPPT is a generic provider, it can't open ports on GCP
@rsmitty How does this work in CI? Our management cluster is in EM IIRC so it should be a similar setup.
looks like network is specified as part of the manifest, so it should be pre-configured: https://github.com/siderolabs/cluster-api-templates/blob/main/gcp/standard/standard.yaml#L28-L29
Yep, this is exactly it. The network we use for CI has default fw rules that allow 50000 and 6443
Andrey is also right that this is something that we can't handle from the CACPPT side. The default behavior with the GCP infra provider is that it'll create its own network to use and whatnot, but what it creates doesn't have the firewall rules we need. Thus the reason we "bring our own" network for it.
So is the recommendation to pre-bake my own networks with port 50000 enabled? I can't say I've investigated but wonder if there is a way to define additional fw rules through the provider which I know isn't in the realm of the Talos problem - What has been done and/or recommended to other users running Talos in GCP to date?
Our recommendation is the above - create a pre-configured GCP network and reference it in the cluster manifest. We are not aware at the moment of a better way to do that.
Using CAPI to create Talos nodes in GCP @andrewrynhard and I discovered there were several I/O timeouts in the serial logs from the nodes and from what we could see within the controller logs.
This lead us to the GCP FW rules and we noticed that port
50000
was not open from external to GCP - This would be perfectly fine if the management cluster was likely also in GCP however, there is a chicken and egg situation where the management cluster will initially at least temporarily exist outside GCP and in my scenario it was running on KIND locally.Hence when attempting to bootstrap the FW rules were blocking my connection.
I also tried to
nc
andtelnet
and these appeared like correct responses initially but it wasn't until we added the new FW rule that the connection began working.