Open mcg1969 opened 1 week ago
I hacked around that particular issue by editing _nebari/provider/terraform.py
and creating a simple function to convert Pydantic objects to a dict:
def _to_dict(sd):
if isinstance(sd, dict):
return {k: _to_dict(v) for k, v in sd.items()}
elif isinstance(sd, (list, tuple)):
return [_to_dict(v) for v in sd]
elif hasattr(sd, 'model_dump'):
return sd.model_dump()
else:
return sd
Then used that to wrap the input to json.dump
:
with tempfile.NamedTemporaryFile(
mode="w", encoding="utf-8", suffix=".tfvars.json"
) as f:
json.dump(_to_dict(input_vars), f.file)
f.file.flush()
That got me farther. However, it ended up failing later in the deployment process with a very similar issue:
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/subcommands/deploy.py:92 │
│ in deploy │
│ │
│ 89 │ │ │ msg = "Digital Ocean support is currently being deprecated and will be remov │
│ 90 │ │ │ typer.confirm(msg) │
│ 91 │ │ │
│ ❱ 92 │ │ deploy_configuration( │
│ 93 │ │ │ config, │
│ 94 │ │ │ stages, │
│ 95 │ │ │ disable_prompt=disable_prompt, │
│ │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/deploy.py:53 in │
│ deploy_configuration │
│ │
│ 50 │ │ with contextlib.ExitStack() as stack: │
│ 51 │ │ │ for stage in stages: │
│ 52 │ │ │ │ s = stage(output_directory=pathlib.Path.cwd(), config=config) │
│ ❱ 53 │ │ │ │ stack.enter_context(s.deploy(stage_outputs, disable_prompt)) │
│ 54 │ │ │ │ │
│ 55 │ │ │ │ if not disable_checks: │
│ 56 │ │ │ │ │ s.check(stage_outputs, disable_prompt) │
│ │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:526 in enter_context │
│ │
│ 523 │ │ except AttributeError: │
│ 524 │ │ │ raise TypeError(f"'{cls.__module__}.{cls.__qualname__}' object does " │
│ 525 │ │ │ │ │ │ │ f"not support the context manager protocol") from None │
│ ❱ 526 │ │ result = _enter(cm) │
│ 527 │ │ self._push_cm_exit(cm, _exit) │
│ 528 │ │ return result │
│ 529 │
│ │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:137 in __enter__ │
│ │
│ 134 │ │ # they are only needed for recreation, which is not possible anymore │
│ 135 │ │ del self.args, self.kwds, self.func │
│ 136 │ │ try: │
│ ❱ 137 │ │ │ return next(self.gen) │
│ 138 │ │ except StopIteration: │
│ 139 │ │ │ raise RuntimeError("generator didn't yield") from None │
│ 140 │
│ │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/stages/kubernetes_keyclo │
│ ak/__init__.py:302 in deploy │
│ │
│ 299 │ def deploy( │
│ 300 │ │ self, stage_outputs: Dict[str, Dict[str, Any]], disable_prompt: bool = False │
│ 301 │ ): │
│ ❱ 302 │ │ with super().deploy(stage_outputs, disable_prompt): │
│ 303 │ │ │ with keycloak_provider_context( │
│ 304 │ │ │ │ stage_outputs["stages/" + self.name]["keycloak_credentials"]["value"] │
│ 305 │ │ │ ): │
│ │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/contextlib.py:137 in __enter__ │
│ │
│ 134 │ │ # they are only needed for recreation, which is not possible anymore │
│ 135 │ │ del self.args, self.kwds, self.func │
│ 136 │ │ try: │
│ ❱ 137 │ │ │ return next(self.gen) │
│ 138 │ │ except StopIteration: │
│ 139 │ │ │ raise RuntimeError("generator didn't yield") from None │
│ 140 │
│ │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/stages/base.py:65 in │
│ deploy │
│ │
│ 62 │ ): │
│ 63 │ │ deploy_config = dict( │
│ 64 │ │ │ directory=str(self.output_directory / self.stage_prefix), │
│ ❱ 65 │ │ │ input_vars=self.input_vars(stage_outputs), │
│ 66 │ │ ) │
│ 67 │ │ state_imports = self.state_imports() │
│ 68 │ │ if state_imports: │
│ │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/_nebari/stages/kubernetes_keyclo │
│ ak/__init__.py:227 in input_vars │
│ │
│ 224 │ │ ] │
│ 225 │ │
│ 226 │ def input_vars(self, stage_outputs: Dict[str, Dict[str, Any]]): │
│ ❱ 227 │ │ return InputVars( │
│ 228 │ │ │ name=self.config.project_name, │
│ 229 │ │ │ environment=self.config.namespace, │
│ 230 │ │ │ endpoint=stage_outputs["stages/04-kubernetes-ingress"]["domain"], │
│ │
│ /home/centos/ae5-conda/envs/nebari/lib/python3.12/site-packages/pydantic/main.py:164 in __init__ │
│ │
│ 161 │ │ """ │
│ 162 │ │ # `__tracebackhide__` tells pytest and some other tools to omit this function fr │
│ 163 │ │ __tracebackhide__ = True │
│ ❱ 164 │ │ __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__p │
│ 165 │ │
│ 166 │ # The following line sets a flag that we use to determine when `__init__` gets overr │
│ 167 │ __init__.__pydantic_base_init__ = True │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValidationError: 1 validation error for InputVars
node_group
Input should be a valid dictionary [type=dict_type, input_value=KeyValueDict(key='kuberne...s.io/os', value='linux'), input_type=KeyValueDict]
For further information visit https://errors.pydantic.dev/2.4/v/dict_type
Hi @mcg1969 , thanks for reporting this.
For local deploys, we use Kind and test with it. Using K3s would be essentially the same as using an existing cluster, which is the least tested and documented part of Nebari.
Could you validate that the local deploy with kind does work for you? That would let us narrow this down to the existing provider.
Thanks!
I was indeed using the existing
approach, not the local
approach. And that choice is deliberate—kind
is not an option for the use case being considered here. This isn't actually intended to be a local deployment.
@dcmcand Confirming: I do not encounter this with the AWS target.
thanks @mcg1969, that is helpful.
@mcg1969 I was able to reproduce this issue when deploying to k3s from 2024.7.1, but not from the current main branch. There may be other issues, but this error is not occurring.
I believe this issue is related to https://github.com/nebari-dev/nebari/issues/2767 and was likely fixed by https://github.com/nebari-dev/nebari/pull/2797. We will have a new release here within a couple of days. Once the new release is out, can you retry?
The traefik CRD's are still an issue, but that is essentially a new feature request, where this is a bug.
Yes, happy to test. I totally understand about the other issue
Describe the bug
Attempting to do a nebari deploy on an existing k3s cluster. I had a separate issue with the Traefik CRDs that I will raise separately. But once I get past that, I see this:
I hacked the
terraform.py
module to see what JSON was struggling with; it is this dictionary, with theKeyValueDict
objects.Those were generated by
nebari init
though! Here is theexisting
section of the config yaml:Expected behavior
It should make it through this stage without this error.
OS and architecture in which you are running Nebari
centos stream 8
How to Reproduce the problem?
installed a stock version of k3s. In order to get to this stage, I had to remove some of the Traefik CRDs that k3s installs for me, because they conflict with some that Terraform is trying to install. But once I let Terraform handle those, I was able to get to this point
Command output