openshift / hive

API driven OpenShift cluster provisioning and management
Apache License 2.0
250 stars 238 forks source link

Limit number of NAT Gateways Created #2466

Open faangbait opened 1 month ago

faangbait commented 1 month ago

Not sure if this is simply missing documentation or a feature request.

We're using the following configuration to deploy SNO clusters via hive.

    platform:
      aws:
        region: us-east-1
        zones:
          - us-east-1a
          - us-east-1b

I'd expect this to result in subnets/NAT Gateways to be created in just us-east-1a and us-east-1b, but the installer is creating gateways in 1c through 1f as well.

Since the Openshift Installer defines this as an acceptable configuration parameter, I'm assuming the issue lies in hive; but I'm happy to repost this to the installer repository if developers here can confirm that is a better place for it.

2uasimojo commented 1 month ago

Those gateways are definitely created by installer. I recall a discussion around this related to cost, where the crux was that you can save money by deploying smaller clusters into “smaller” regions (those with fewer AZs) since these gateways are always created and they’re expensive. But I don’t know if the topic was ever approached from the perspective of being able to restrict day 0 to just using the AZs in the install-config.

Definitely something the installer team would need to answer. @patrickdillon ?

patrickdillon commented 1 month ago

This logic is definitely the responsibility of the installer side--not Hive. @faangbait feel free to open a bug against the installer. I have asked @mtulio to assess as well.

Also note the config you posted is not valid. We don't have any section in the install config where region and zones are at the same level. I think the intent here was:

    platform:
      aws:
        region: us-east-1
        defaultMachinePlatform:
          zones:
            - us-east-1a
            - us-east-1b

But AFAICT correcting the config still does not fix the issue, so I suspect we do need some changes in the installer logic.

patrickdillon commented 1 month ago

But AFAICT correcting the config still does not fix the issue, so I suspect we do need some changes in the installer logic.

oops double checked and I was looking in the wrong place. The manifests are indeed generated correctly. Can you check whether fixing your config resolves the issue?

mtulio commented 1 month ago

This logic is definitely the responsibility of the installer side--not Hive. @faangbait feel free to open a bug against the installer. I have asked @mtulio to assess as well.

That's it.

@faangbait The main idea is platform.aws.defaultMachinePlatform.zones will limit the amount of zones that will be discovered by AWS APIs, in your example instead of using all available zones, the installer will use your default (us-east-1a and us-east-1b) to make your VPC/infrastructure.

That's worth to mention that the compute pools definitions takes precedence on your defaults. So if defined the zone sin any of the compute pools (compute[.name==worker].platform.aws.zones or controlPlane.platform.aws.zones) definition in install-config.yaml, it will use it too.

faangbait commented 1 month ago

To clarify, I tried it the other way (as defined in the manifest) first. Forgot to hit the undo button before copying and pasting over here -- but I can confirm that neither worked.

That's worth to mention that the compute pools definitions takes precedence on your defaults.

This is a good lead. Since we're not defining worker nodes, it's probably overriding the control plane config with the empty config provided to workers.

If that doesn't solve it, we'll document deleting the extra AZs as a day 1 problem. Easy enough to script.