nebari-dev / nebari

🪴 Nebari - your open source data science platform
https://nebari.dev
BSD 3-Clause "New" or "Revised" License
282 stars 93 forks source link

[BUG] - Validation error when deploying to Digital Ocean #2530

Open karamba228 opened 5 months ago

karamba228 commented 5 months ago

Describe the bug

Deploying a config to Digital Ocean Causes Validation error:

config file:

provider: do
namespace: dev
nebari_version: 2024.5.1
project_name: ne8ari-457
ci_cd:
  type: none
terraform_state:
  type: remote
security:
  keycloak:
    initial_root_password: r0hgsbxfdjsibmg62pu8y49jr0bbw2i9
  authentication:
    type: password
theme:
  jupyterhub:
    hub_title: Nebari - ne8ari-457
    welcome: Welcome! Learn about Nebari's features and configurations in <a href="https://www.nebari.dev/docs/welcome">the
      documentation</a>. If you have any questions or feedback, reach the team on
      <a href="https://www.nebari.dev/docs/community#getting-support">Nebari's support
      forums</a>.
    hub_subtitle: Your open source data science platform, hosted on Digital Ocean
digital_ocean:
  kubernetes_version: 1.28.10-do.0
  region: sfo3
  node_groups:
    general:
      instance: g-8vcpu-32gb
      min_nodes: 1
      max_nodes: 1
    user:
      instance: g-4vcpu-16gb
      min_nodes: 1
      max_nodes: 5
    worker:
      instance: g-4vcpu-16gb
      min_nodes: 1
      max_nodes: 5

Expected behavior

There should be no validation error.

OS and architecture in which you are running Nebari

windows 11 with WSL

How to Reproduce the problem?

After digitalocean_spaces_bucket is created the deploy process stops with a validation error

Command output

ValidationError: 3 validation errors for DigitalOceanInputVars
node_groups.general
  Input should be a valid dictionary or instance of DigitalOceanNodeGroup [type=model_type, input_value=DigitalOceanNodeGroup(ins...in_nodes=1, max_nodes=1), input_type=DigitalOceanNodeGroup]
    For further information visit https://errors.pydantic.dev/2.4/v/model_type
node_groups.user
  Input should be a valid dictionary or instance of DigitalOceanNodeGroup [type=model_type, input_value=DigitalOceanNodeGroup(ins...in_nodes=1, max_nodes=5), input_type=DigitalOceanNodeGroup]
    For further information visit https://errors.pydantic.dev/2.4/v/model_type
node_groups.worker
  Input should be a valid dictionary or instance of DigitalOceanNodeGroup [type=model_type, input_value=DigitalOceanNodeGroup(ins...in_nodes=1, max_nodes=5), input_type=DigitalOceanNodeGroup]
    For further information visit https://errors.pydantic.dev/2.4/v/model_type

Versions and dependencies used.

conda 24.4.0

Client Version: v1.30.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

2024.5.1

Compute environment

Digital Ocean

Integrations

No response

Anything else?

No response

viniciusdc commented 5 months ago

Hi @karamba228, Thanks for opening the issue and trying out Nebari!! The issue you are seeing is most probably coming from our internal validation against digital ocean available droplet types:

Its defined here: https://github.com/nebari-dev/nebari/blob/9baab7e29e2ec9b172ef439ddc66a5e51a380066/src/_nebari/stages/infrastructure/__init__.py#L265-L271

But the actual available instance types come from here: https://github.com/nebari-dev/nebari/blob/9baab7e29e2ec9b172ef439ddc66a5e51a380066/src/_nebari/provider/cloud/digital_ocean.py#L54-L55

The issue you are facing might be a bug or a corner case we didn't anticipate. If possible, if you could check if the instance type you are passing are indeed showing up in the API response from your DO account, just to make sure its not a quota/permission issue. Here's a created a small reproduction code for the request:

import requests

url = "https://api.digitalocean.com/v2/kubernetes/options"

payload = {}
headers = {
  # User your DIGITALOCEAN_TOKEN here, as the bearer token
  'Authorization': 'Bearer dop_v1_***',
}

response = requests.request("GET", url, headers=headers, data=payload)

print(response.text)
viniciusdc commented 5 months ago

Also, I saw that you've tried with WSL, is it WSL1 or WSL2? There's a high chance you will face weird issues later on in the deployment if it's the WSL1, I recommend switching if possible

karamba228 commented 5 months ago

I just tried running deploy command using WSL1 and came up with the same error. I also ran the test code you suggested and the requested instance types are in the list of available instances.

viniciusdc commented 5 months ago

Thanks for the follow-up, @karamba228; the fact that the instance types are showing up there is a good thing. I will try to reproduce the same error based on the config you shared, I wonder why its complaying... I will keep you posted.

BTW, did you attempt retrying the deploy after a few minutes?

karamba228 commented 5 months ago

Yes I have, It succeeded at deploying a Spaces bucket, but kept failing on the instance creation part

Adam-D-Lewis commented 5 months ago

We are planning on deprecating Digital Ocean support due to low usage. It could still be deployed on Digital Ocean via an existing cluster deployment. Please comment on the issue if you think we should keep it - https://github.com/nebari-dev/nebari/issues/2542 @karamba228