nebari-dev / nebari

🪴 Nebari - your open source data science platform
https://nebari.dev
BSD 3-Clause "New" or "Revised" License
279 stars 89 forks source link

[ENH] - AWS Deployment | R53 Switch and ACM Certificate add #2233

Open ronald50928 opened 8 months ago

ronald50928 commented 8 months ago

Feature description

When deploying Nebari using AWS, the step that it is looking to find the IP and getting the DNS resolution, what are the opportunities to make this change part of the automation, where the ACM certificate gets passed in the configuration as long as it was created before deployment?

Is there a way to automate this in Terraform or a feature in the works? That a stage/ module will look for the ACM add it to the CLB and make the R53 record changes required to send traffic to the CLB created during deployment.

https://www.nebari.dev/docs/how-tos/domain-registry#setting-up-a-dns https://www.nebari.dev/docs/how-tos/domain-registry#using-other-dns-providers https://www.nebari.dev/docs/how-tos/nebari-aws#deploying-nebari https://www.nebari.dev/docs/how-tos/nebari-aws#deploying-nebari:~:text=During%20deployment%2C%20Nebari%20will%20require%20you%20to%20set%20a%20DNS%20record%20for%20the%20domain%20defined%20during%20initialize.%20Follow%20the%20instructions%20on%20How%20to%20set%20a%20DNS%20record%20for%20Nebari%20for%20an%20overview%20of%20the%20required%20steps.

Value and/or benefit

A cleaner and almost end to end private deployment on AWS. Currently the AWS deployment of Nebari within a private subnet and using ACM requires additional steps. This will cover an additional step in the AWS deployment and potentially adding a session in the "using other dns providers."

Anything else?

To add to formal documentation in Using other DNS providers :

During your AWS Deployment, terraform will output a similar like this:

[terraform]: Apply complete! Resources: 7 added, 0 changed, 0 destroyed.
[terraform]:
[terraform]: Outputs:
[terraform]:
[terraform]: load_balancer_address = {
[terraform]:   "hostname" = "internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com"
[terraform]:   "ip" = ""
[terraform]: }
Attempt 1 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 2 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 3 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 4 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 5 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 6 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 7 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 8 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...

At this time the user needs to go the AWS Management Console or via AWS CLI and take the record name(previouslly created) or created at the time of deployment and add the load balancer mentioned above into the R53 record. The record that refers this deployment should be related to the "domain:" being defined during the nebari init --guided-init in the step: "domain will be the domain endpoint for your Nebari instance."

Once this step above is completed the user should start seeing the behavior below. Or can re-deploy the deployment and should pickup the DNS rsolution. See below:

[terraform]: load_balancer_address = {
[terraform]:   "hostname" = "internal-adb1a7d6e77da4153a80577574774996-12345667.us-west-2.elb.amazonaws.com"
[terraform]:   "ip" = ""
[terraform]: }
Attempt 1 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 2 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 3 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 4 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 5 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 6 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 7 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 8 failed to get IP for internal-adb1a7d6e77da4153a80577574774996-123456789.us-west-2.elb.amazonaws.com...
Attempt 9 succeeded to connect to tcp://10.12.66.90:80
Attempt 1 succeeded to connect to tcp://10.12.71.210:8786
Attempt 1 succeeded to connect to tcp://10.12.71.210:8022
Attempt 1 succeeded to connect to tcp://10.12.71.210:8023
Attempt 1 succeeded to connect to tcp://10.12.71.210:9080
Attempt 1 succeeded to connect to tcp://10.12.71.210:443
After stage=04-kubernetes-ingress kubernetes ingress available on tcp ports={80, 8786, 8022, 8023, 9080, 443}
DNS configured domain=nebari.dev-wma.chs.usgs.gov matches ingress ips=10.12.71.210
[terraform]:

in addition, when using ACM within a private environment deployment the user(s) or administrator deploy the product need to go to the CLB created by the deployment and change the listening port 443 from TCP to SSL and add the ACM certificate to that listing port. action that can be automated.

Adam-D-Lewis commented 3 weeks ago

Sorry for the long delay in getting a response. Extending automatic DNS provision to include Route 53 would be useful and I'd support a PR adding that if it's something you're interested in working on.

You mentioned adding a new stage for AWS deployments. We don't do that for any existing stages so there are likely some issues to think through doing that. Preferably, we would add the additional steps in an existing stage and make the new features optional through the nebari config.

ronald50928 commented 3 weeks ago

Hey Adam! This is great, yes, I am interested in moving this forward in any way possible. As a deployment within USGS, this will help facilitate the internal resolution(DNS) within the internal domain.

Adam-D-Lewis commented 3 weeks ago

To make sure we're on the same page, you're talking about using a custom cert from ACM, right?

The existing process to do so is explained in the docs and is the following:

You're hoping to make the new process:

Is that right?

It looks like it's possible to remove the need to manually update the cert in step 2 using annotations on ~an ingress resource as mentioned here and here~ if we do TLS termination at the AWS Application Load Balancer instead, but I'm not certain. It would probably be a bit larger of an effort, and may cause other issues.

Update: The annotations would actually need to be applied to the k8s LoadBalancer service which is currently being used in Nebari as mentioned here and here

ronald50928 commented 3 weeks ago

Yes and no - Instead is to use a current certificate, not create one. Here are the annotations I've tried- but never got the inner pieces of nebari to do it automagically.

ingress: terraform_overrides: load-balancer-annotations: service.beta.kubernetes.io/aws-load-balancer-internal: "true" service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443" service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:us-east-1:123456789012:certificate/1234556678990" service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "ssl" service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy: "ELB SecurityPolicy-2016-08"

Adam-D-Lewis commented 3 weeks ago

I agree what you've tried seems to be on the right track. The warning at the top of the screen seems to suggest that you might need to set service.beta.kubernetes.io/aws-load-balancer-type also. image

I would also think that the web redirection to websecure setting in traefik may cause some issues with what you're trying to do at some point.

            # Redirect http -> https
            "--entrypoints.web.http.redirections.entryPoint.to=websecure",
            "--entrypoints.web.http.redirections.entryPoint.scheme=https",