sassoftware / viya4-iac-aws

This project contains Terraform configuration files to provision infrastructure components required to deploy SAS Viya platform products products on Amazon AWS.
Apache License 2.0
38 stars 43 forks source link

Incorrect s3 private endpoint used #180

Closed joshcoburn closed 1 year ago

joshcoburn commented 1 year ago

Terraform Version Details

When using viya4-iac-aws to deploy in private or offline situations, the EKS creates but nodes fail to join the cluster. Referencing AWS documentation, AWS specifies the use of "Gateway" S3 private endpoint for offline/private EKS deployments (found here).

Our IaC current deploys all private endpoints as "interface" type.

Terraform Variable File Details

No response

Steps to Reproduce

Use IaC in BYO scenario where the cluster API is private and has no outbound internet access (no NAT GW) from the VPC.

Expected Behavior

EKS will build and nodes can join the cluster.

Actual Behavior

EKS will build but nodes fail to join the cluster.

Additional Context

No response

References

No response

Code of Conduct

joshcoburn commented 1 year ago

Upon further investigation, it appears this was not the cause of the issue. S3 Gateway and interface should function the same, a Gateway type is a cheaper option.

Closing issue and cancelling associated PR.

joshcoburn commented 1 year ago

This resurfaced from another deployment I was helping with. So I decided to investigate and test again.

It appears private EKS does in fact need a Gateway type (not an interface type) S3 endpoint in order to function correctly.

Test scenarios:

  1. VPC, subnets, security groups deployed by CloudFormation (with no endpoints). IAC 5.4.0 used with BYON variables supplied. IAC created the vpc endpoints (S3 type: interface).
  2. VPC, subnets, security groups deployed by CloudFormation with endpoints based off of AWS provided template for private EKS (S3 type: gateway). IAC 5.4.0 used with BYON variables supplied, mod applied to remove deployment of: NAT, IGW, and private_endpoints (changed count to 0 for all resources with terraform override).

Test Results:

  1. Nodes failed to join the cluster.
  2. Nodes succeed joining the cluster.

While I can't find this explicitly in the AWS documentation, it is mentioned on the eksclt.io page and also specified in the AWS provided EKS private VPC template.

A S3 Gateway type also has to be associated with a route table (an interface does not), so my work around at the moment would be to manually add a S3 gateway to the VPC and associate with the route table. You could also do like I did in my successful test scenario (scenario #2).

This may not be an issue in subscriptions with centralized VPC endpoints (hub and spoke) and that centralization has a S3 Gateway already. In this case, it might be advantageous to not deploy the IAC created endpoints (so would require modification to terraform or terraform override file).

joshcoburn commented 1 year ago

moving discussion to internal ticket and closing issue.

joshcoburn commented 6 months ago

After further investigation/testing, we ultimately discovered that functionally a S3 VPC endpoint with either Gateway type OR interface type will work in an EKS air gap scenario. The difference is in implementation and cost. A gateway type requires a routable association whereas interfaces do not and Gateway types are free whereas interfaces have a cost associated.

See https://github.com/sassoftware/viya4-iac-aws/issues/272#issuecomment-1978945557 for further information.