nebari-dev / nebari

πŸͺ΄ Nebari - your open source data science platform
https://nebari.dev
BSD 3-Clause "New" or "Revised" License
279 stars 90 forks source link

[BUG] - AWS Deploy failing with `Failed to identify fetch peer certificates` #2399

Open aktech opened 6 months ago

aktech commented 6 months ago

Describe the bug

As seen by @ronald50928

He's facing issues deploying Nebari inside a private network. The instance he is deploying from is inside the VPC (connecting via a VPN).

Expected behavior

Deployment completing with no errors

OS and architecture in which you are running Nebari

Linux

How to Reproduce the problem?

Ran the following command:

nebari deploy -c nebari-config.yml

with following configuration:

provider: aws
namespace: dev
nebari_version: 2024.3.2
project_name: <SANITIZED>
domain: <SANITIZED>
helm_extensions: []
monitoring:
  enabled: true
argo_workflows:
  enabled: true
  nebari_workflow_controller:
    enabled: true
ci_cd:
  type: none
terraform_state:
  type: remote
ingress:
  terraform_overrides:
    load-balancer-annotations:
      service.beta.kubernetes.io/aws-load-balancer-internal: "true"
security:
  keycloak:
    initial_root_password: <SANITIZED>
  authentication:
    type: password

amazon_web_services:
  kubernetes_version: '1.29'
  region: us-east-1
  permissions_boundary: arn:aws:iam::<ACCOUNT-ID>:policy/<Permissions-Boundary-POLICY-NAME>
  existing_subnet_ids: ["subnet-<SUBNET-ID-1>", "subnet-SUBNET-ID-2"]
  existing_security_group_id: sg-<SECURITY-GROUP-1>
  node_groups:
    general:
      instance: m5.2xlarge
      min_nodes: 1
      max_nodes: 1
    user:
      instance: m5.xlarge
      min_nodes: 1
      max_nodes: 100
    worker:
      instance: m5.xlarge
      min_nodes: 0
      max_nodes: 450
jhub_apps:
  enabled: true

Command output

terraform]: β•·
[terraform]: β”‚ Error: Failed to identify fetch peer certificates
[terraform]: β”‚
[terraform]: β”‚   with module.kubernetes.data.tls_certificate.this,
[terraform]: β”‚   on modules/kubernetes/main.tf line 82, in data "tls_certificate" "this":
[terraform]: β”‚   82: data "tls_certificate" "this" {
[terraform]: β”‚
[terraform]: β”‚ failed to fetch certificates from URL 'https': Get
[terraform]: β”‚ "[https://oidc.eks.us-east-1.amazonaws.com:443/id/A381A8C89FAEE2FC03AF83E334B12AEE](https://oidc.eks.us-east-1.amazonaws.com/id/A381A8C89FAEE2FC03AF83E334B12AEE)":
[terraform]: β”‚ dial tcp: lookup oidc.eks.us-east-1.amazonaws.com on 172.17.0.2:53: no such
[terraform]: β”‚ host
[terraform]: β•΅

Versions and dependencies used.

Nebari version: 2024.3.2 Kubernetes version: 1.29

Compute environment

AWS

Integrations

No response

Anything else?

No response

viniciusdc commented 6 months ago

I am not sure if it's related, but I once needed to update the default security group created by Nebari to work with the internal VPN that was already in place; on AWS, there was a certain button to include it.

viniciusdc commented 6 months ago

I would also try adding the extra certificates field, and try to include it manually:

### Certificate configuration ###
certificate:
  type: existing
  secret_name: <secret-name>
aktech commented 6 months ago

I would also try adding the extra certificates field, and try to include it manually:

This isn't related to ssl certs for the exposed load balancer, as that's not event deployed. This is related to connecting to the created k8s cluster.