nebari-dev / nebari

🪴 Nebari - your open source data science platform
https://nebari.dev
BSD 3-Clause "New" or "Revised" License
281 stars 93 forks source link

[BUG] -TypeError: Object of type KeyValueDict is not JSON serializable #2819

Open mcg1969 opened 1 week ago

mcg1969 commented 1 week ago

Describe the bug

Attempting to do a nebari deploy on an existing k3s cluster. I had a separate issue with the Traefik CRDs that I will raise separately. But once I get past that, I see this:

[terraform]: After stage=03-kubernetes-initialize kubernetes initialized successfully
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/subcommands/deploy.py:92 │
│ in deploy                                                                                        │
│                                                                                                  │
│   89 │   │   │   msg = "Digital Ocean support is currently being deprecated and will be remov    │
│   90 │   │   │   typer.confirm(msg)                                                              │
│   91 │   │                                                                                       │
│ ❱ 92 │   │   deploy_configuration(                                                               │
│   93 │   │   │   config,                                                                         │
│   94 │   │   │   stages,                                                                         │
│   95 │   │   │   disable_prompt=disable_prompt,                                                  │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/deploy.py:53 in          │
│ deploy_configuration                                                                             │
│                                                                                                  │
│   50 │   │   with contextlib.ExitStack() as stack:                                               │
│   51 │   │   │   for stage in stages:                                                            │
│   52 │   │   │   │   s = stage(output_directory=pathlib.Path.cwd(), config=config)               │
│ ❱ 53 │   │   │   │   stack.enter_context(s.deploy(stage_outputs, disable_prompt))                │
│   54 │   │   │   │                                                                               │
│   55 │   │   │   │   if not disable_checks:                                                      │
│   56 │   │   │   │   │   s.check(stage_outputs, disable_prompt)                                  │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:526 in enter_context             │
│                                                                                                  │
│   523 │   │   except AttributeError:                                                             │
│   524 │   │   │   raise TypeError(f"'{cls.__module__}.{cls.__qualname__}' object does "          │
│   525 │   │   │   │   │   │   │   f"not support the context manager protocol") from None         │
│ ❱ 526 │   │   result = _enter(cm)                                                                │
│   527 │   │   self._push_cm_exit(cm, _exit)                                                      │
│   528 │   │   return result                                                                      │
│   529                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:137 in __enter__                 │
│                                                                                                  │
│   134 │   │   # they are only needed for recreation, which is not possible anymore               │
│   135 │   │   del self.args, self.kwds, self.func                                                │
│   136 │   │   try:                                                                               │
│ ❱ 137 │   │   │   return next(self.gen)                                                          │
│   138 │   │   except StopIteration:                                                              │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   140                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/stages/base.py:72 in     │
│ deploy                                                                                           │
│                                                                                                  │
│    69 │   │   │   deploy_config["terraform_import"] = True                                       │
│    70 │   │   │   deploy_config["state_imports"] = state_imports                                 │
│    71 │   │                                                                                      │
│ ❱  72 │   │   self.set_outputs(stage_outputs, terraform.deploy(**deploy_config))                 │
│    73 │   │   self.post_deploy(stage_outputs, disable_prompt)                                    │
│    74 │   │   yield                                                                              │
│    75                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/provider/terraform.py:59 │
│ in deploy                                                                                        │
│                                                                                                  │
│    56 │   │   mode="w", encoding="utf-8", suffix=".tfvars.json"                                  │
│    57 │   ) as f:                                                                                │
│    58 │   │   print("INPUT_VARS:", input_vars)                                                   │
│ ❱  59 │   │   json.dump(input_vars, f.file)                                                      │
│    60 │   │   f.file.flush()                                                                     │
│    61 │   │                                                                                      │
│    62 │   │   if terraform_init:                                                                 │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/__init__.py:179 in dump                   │
│                                                                                                  │
│   176 │   │   │   default=default, sort_keys=sort_keys, **kw).iterencode(obj)                    │
│   177 │   # could accelerate with writelines in some versions of Python, at                      │
│   178 │   # a debuggability cost                                                                 │
│ ❱ 179 │   for chunk in iterable:                                                                 │
│   180 │   │   fp.write(chunk)                                                                    │
│   181                                                                                            │
│   182                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/encoder.py:432 in _iterencode             │
│                                                                                                  │
│   429 │   │   elif isinstance(o, (list, tuple)):                                                 │
│   430 │   │   │   yield from _iterencode_list(o, _current_indent_level)                          │
│   431 │   │   elif isinstance(o, dict):                                                          │
│ ❱ 432 │   │   │   yield from _iterencode_dict(o, _current_indent_level)                          │
│   433 │   │   else:                                                                              │
│   434 │   │   │   if markers is not None:                                                        │
│   435 │   │   │   │   markerid = id(o)                                                           │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/encoder.py:406 in _iterencode_dict        │
│                                                                                                  │
│   403 │   │   │   │   │   chunks = _iterencode_dict(value, _current_indent_level)                │
│   404 │   │   │   │   else:                                                                      │
│   405 │   │   │   │   │   chunks = _iterencode(value, _current_indent_level)                     │
│ ❱ 406 │   │   │   │   yield from chunks                                                          │
│   407 │   │   if newline_indent is not None:                                                     │
│   408 │   │   │   _current_indent_level -= 1                                                     │
│   409 │   │   │   yield '\n' + _indent * _current_indent_level                                   │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/encoder.py:406 in _iterencode_dict        │
│                                                                                                  │
│   403 │   │   │   │   │   chunks = _iterencode_dict(value, _current_indent_level)                │
│   404 │   │   │   │   else:                                                                      │
│   405 │   │   │   │   │   chunks = _iterencode(value, _current_indent_level)                     │
│ ❱ 406 │   │   │   │   yield from chunks                                                          │
│   407 │   │   if newline_indent is not None:                                                     │
│   408 │   │   │   _current_indent_level -= 1                                                     │
│   409 │   │   │   yield '\n' + _indent * _current_indent_level                                   │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/encoder.py:439 in _iterencode             │
│                                                                                                  │
│   436 │   │   │   │   if markerid in markers:                                                    │
│   437 │   │   │   │   │   raise ValueError("Circular reference detected")                        │
│   438 │   │   │   │   markers[markerid] = o                                                      │
│ ❱ 439 │   │   │   o = _default(o)                                                                │
│   440 │   │   │   yield from _iterencode(o, _current_indent_level)                               │
│   441 │   │   │   if markers is not None:                                                        │
│   442 │   │   │   │   del markers[markerid]                                                      │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/json/encoder.py:180 in default                 │
│                                                                                                  │
│   177 │   │   │   │   return super().default(o)                                                  │
│   178 │   │                                                                                      │
│   179 │   │   """                                                                                │
│ ❱ 180 │   │   raise TypeError(f'Object of type {o.__class__.__name__} '                          │
│   181 │   │   │   │   │   │   f'is not JSON serializable')                                       │
│   182 │                                                                                          │
│   183 │   def encode(self, o):                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

I hacked the terraform.py module to see what JSON was struggling with; it is this dictionary, with the KeyValueDict objects.

{'traefik-image': {'image': 'traefik', 'tag': '2.9.1'}, 'name': 'k3s', 'environment': 'nebari', 'node_groups': {'general': KeyValueDict(key='kubernetes.io/os', value='linux'), 'user': KeyValueDict(key='kubernetes.io/os', value='linux'), 'worker': KeyValueDict(key='kubernetes.io/os', value='linux')}, 'certificate-service': <CertificateEnum.selfsigned: 'self-signed'>}

Those were generated by nebari init though! Here is the existing section of the config yaml:

existing:
  kube_context: default
  node_selectors:
    general:
      key: kubernetes.io/os
      value: linux
    user:
      key: kubernetes.io/os
      value: linux
    worker:
      key: kubernetes.io/os
      value: linux

Expected behavior

It should make it through this stage without this error.

OS and architecture in which you are running Nebari

centos stream 8

How to Reproduce the problem?

installed a stock version of k3s. In order to get to this stage, I had to remove some of the Traefik CRDs that k3s installs for me, because they conflict with some that Terraform is trying to install. But once I let Terraform handle those, I was able to get to this point

Command output

nebari deploy -c nebari-config.yaml


### Versions and dependencies used.

conda 24.9.2
Client Version: v1.30.3+k3s1
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.3+k3s1
Nebari version 2024.7.1

### Compute environment

None

### Integrations

_No response_

### Anything else?

_No response_
mcg1969 commented 1 week ago

I hacked around that particular issue by editing _nebari/provider/terraform.py and creating a simple function to convert Pydantic objects to a dict:

def _to_dict(sd):
    if isinstance(sd, dict):
        return {k: _to_dict(v) for k, v in sd.items()}
    elif isinstance(sd, (list, tuple)):
        return [_to_dict(v) for v in sd]
    elif hasattr(sd, 'model_dump'):
        return sd.model_dump()
    else:
        return sd

Then used that to wrap the input to json.dump:

    with tempfile.NamedTemporaryFile(
        mode="w", encoding="utf-8", suffix=".tfvars.json"
    ) as f:
        json.dump(_to_dict(input_vars), f.file)
        f.file.flush()

That got me farther. However, it ended up failing later in the deployment process with a very similar issue:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/subcommands/deploy.py:92 │
│ in deploy                                                                                        │
│                                                                                                  │
│   89 │   │   │   msg = "Digital Ocean support is currently being deprecated and will be remov    │
│   90 │   │   │   typer.confirm(msg)                                                              │
│   91 │   │                                                                                       │
│ ❱ 92 │   │   deploy_configuration(                                                               │
│   93 │   │   │   config,                                                                         │
│   94 │   │   │   stages,                                                                         │
│   95 │   │   │   disable_prompt=disable_prompt,                                                  │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/deploy.py:53 in          │
│ deploy_configuration                                                                             │
│                                                                                                  │
│   50 │   │   with contextlib.ExitStack() as stack:                                               │
│   51 │   │   │   for stage in stages:                                                            │
│   52 │   │   │   │   s = stage(output_directory=pathlib.Path.cwd(), config=config)               │
│ ❱ 53 │   │   │   │   stack.enter_context(s.deploy(stage_outputs, disable_prompt))                │
│   54 │   │   │   │                                                                               │
│   55 │   │   │   │   if not disable_checks:                                                      │
│   56 │   │   │   │   │   s.check(stage_outputs, disable_prompt)                                  │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:526 in enter_context             │
│                                                                                                  │
│   523 │   │   except AttributeError:                                                             │
│   524 │   │   │   raise TypeError(f"'{cls.__module__}.{cls.__qualname__}' object does "          │
│   525 │   │   │   │   │   │   │   f"not support the context manager protocol") from None         │
│ ❱ 526 │   │   result = _enter(cm)                                                                │
│   527 │   │   self._push_cm_exit(cm, _exit)                                                      │
│   528 │   │   return result                                                                      │
│   529                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:137 in __enter__                 │
│                                                                                                  │
│   134 │   │   # they are only needed for recreation, which is not possible anymore               │
│   135 │   │   del self.args, self.kwds, self.func                                                │
│   136 │   │   try:                                                                               │
│ ❱ 137 │   │   │   return next(self.gen)                                                          │
│   138 │   │   except StopIteration:                                                              │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   140                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/stages/kubernetes_keyclo │
│ ak/__init__.py:302 in deploy                                                                     │
│                                                                                                  │
│   299 │   def deploy(                                                                            │
│   300 │   │   self, stage_outputs: Dict[str, Dict[str, Any]], disable_prompt: bool = False       │
│   301 │   ):                                                                                     │
│ ❱ 302 │   │   with super().deploy(stage_outputs, disable_prompt):                                │
│   303 │   │   │   with keycloak_provider_context(                                                │
│   304 │   │   │   │   stage_outputs["stages/" + self.name]["keycloak_credentials"]["value"]      │
│   305 │   │   │   ):                                                                             │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:137 in __enter__                 │
│                                                                                                  │
│   134 │   │   # they are only needed for recreation, which is not possible anymore               │
│   135 │   │   del self.args, self.kwds, self.func                                                │
│   136 │   │   try:                                                                               │
│ ❱ 137 │   │   │   return next(self.gen)                                                          │
│   138 │   │   except StopIteration:                                                              │
│   139 │   │   │   raise RuntimeError("generator didn't yield") from None                         │
│   140                                                                                            │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/stages/base.py:65 in     │
│ deploy                                                                                           │
│                                                                                                  │
│    62 │   ):                                                                                     │
│    63 │   │   deploy_config = dict(                                                              │
│    64 │   │   │   directory=str(self.output_directory / self.stage_prefix),                      │
│ ❱  65 │   │   │   input_vars=self.input_vars(stage_outputs),                                     │
│    66 │   │   )                                                                                  │
│    67 │   │   state_imports = self.state_imports()                                               │
│    68 │   │   if state_imports:                                                                  │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/stages/kubernetes_keyclo │
│ ak/__init__.py:227 in input_vars                                                                 │
│                                                                                                  │
│   224 │   │   ]                                                                                  │
│   225 │                                                                                          │
│   226 │   def input_vars(self, stage_outputs: Dict[str, Dict[str, Any]]):                        │
│ ❱ 227 │   │   return InputVars(                                                                  │
│   228 │   │   │   name=self.config.project_name,                                                 │
│   229 │   │   │   environment=self.config.namespace,                                             │
│   230 │   │   │   endpoint=stage_outputs["stages/04-kubernetes-ingress"]["domain"],              │
│                                                                                                  │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/pydantic/main.py:164 in __init__ │
│                                                                                                  │
│    161 │   │   """                                                                               │
│    162 │   │   # `__tracebackhide__` tells pytest and some other tools to omit this function fr  │
│    163 │   │   __tracebackhide__ = True                                                          │
│ ❱  164 │   │   __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__p  │
│    165 │                                                                                         │
│    166 │   # The following line sets a flag that we use to determine when `__init__` gets overr  │
│    167 │   __init__.__pydantic_base_init__ = True                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValidationError: 1 validation error for InputVars
node_group
  Input should be a valid dictionary [type=dict_type, input_value=KeyValueDict(key='kuberne...s.io/os', value='linux'), input_type=KeyValueDict]
    For further information visit https://errors.pydantic.dev/2.4/v/dict_type
dcmcand commented 1 week ago

Hi @mcg1969 , thanks for reporting this.

For local deploys, we use Kind and test with it. Using K3s would be essentially the same as using an existing cluster, which is the least tested and documented part of Nebari.

Could you validate that the local deploy with kind does work for you? That would let us narrow this down to the existing provider.

Thanks!

mcg1969 commented 1 week ago

I was indeed using the existing approach, not the local approach. And that choice is deliberate—kind is not an option for the use case being considered here. This isn't actually intended to be a local deployment.

mcg1969 commented 1 week ago

@dcmcand Confirming: I do not encounter this with the AWS target.

dcmcand commented 1 week ago

thanks @mcg1969, that is helpful.

dcmcand commented 1 week ago

@mcg1969 I was able to reproduce this issue when deploying to k3s from 2024.7.1, but not from the current main branch. There may be other issues, but this error is not occurring.

I believe this issue is related to https://github.com/nebari-dev/nebari/issues/2767 and was likely fixed by https://github.com/nebari-dev/nebari/pull/2797. We will have a new release here within a couple of days. Once the new release is out, can you retry?

The traefik CRD's are still an issue, but that is essentially a new feature request, where this is a bug.

mcg1969 commented 1 week ago

Yes, happy to test. I totally understand about the other issue