nebari-dev / nebari

🪴 Nebari - your open source data science platform
https://nebari.dev
BSD 3-Clause "New" or "Revised" License
279 stars 92 forks source link

[BUG] - Terraform provider version inconsistency within stages #2614

Open viniciusdc opened 2 months ago

viniciusdc commented 2 months ago

Describe the bug

We must be more consistent in Terraform provider versions across different deployment stages. This discrepancy can lead to unpredictable behavior and potential issues during deployment. For example, on a recent AWS deployment, I noticed the following in deployment logs from Terraform:

Stage 01 -- Terraform State:

Stage 02:

Stage 03:

While we do set the version for the most important infrastructure resources: https://github.com/nebari-dev/nebari/blob/a65ff53df9c7cdfa4bf1b99b9099f7d5efa1240d/src/_nebari/stages/infrastructure/template/aws/versions.tf#L1-L9

The order stages use the terraform.Provider to instantiate the providers across the deployment: https://github.com/nebari-dev/nebari/blob/a65ff53df9c7cdfa4bf1b99b9099f7d5efa1240d/src/_nebari/stages/terraform_state/__init__.py#L181-L186

We should make sure that it becomes consistent. Also, the exciting thing is that after stage 3, it becomes consistent across all calls; I guess it comes from the backend being already set up.

Expected behavior

At least the cloud provider versions respect the versions described in their infra modules, as that would be expected.

OS and architecture in which you are running Nebari

Linux

How to Reproduce the problem?

Any cloud provider deployment might lead to the same problem.

Command output

No response

Versions and dependencies used.

No response

Compute environment

AWS

Integrations

No response

Anything else?

No response

Adam-D-Lewis commented 2 months ago

I think we could add a consistent version in tf_objects.py instead of in the terraform files directly. That would enforce the same version in all of the stages.

Adam-D-Lewis commented 2 months ago

I'm also curious what issues you've seen from this. It seems like it shouldn't cause a problem to use different provider versions in different stages.

viniciusdc commented 2 months ago

I'm also curious what issues you've seen from this. It seems like it shouldn't cause a problem to use different provider versions in different stages.

I haven't noted any issue directly, but keeping this inconsistency might open the chance for bugs where tracking would be difficult; for example, a specific version of the provider might handle certain API request in a particular order while another newer version does not (same with error messages) or have different internal requirements like region, zones etc..

Adam-D-Lewis commented 2 months ago

It seems like b/c the stages are isolated from each other (isolated terraform modules) that differering provider versions should be okay. That said, I think we should try to keep the versions consistent between the stages, but I'm not sure I would support enforcing it (e.g. for plugins), at least not until we see an issue.

marcelovilla commented 2 days ago

@smokestacklightnin will be picking up this issue

marcelovilla commented 2 days ago

@smokestacklightnin, here's a bit more context to help bring you up to speed on this issue.

Nebari has several Terraform stages, which are run sequentially because some require the output of others as input. For each stage, we have one or multiple versions.tf file where Terraform provider versions are defined. For example: https://github.com/nebari-dev/nebari/blob/9b1310b33e89c2c11c3b39128ec792ca80342486/src/_nebari/stages/infrastructure/template/aws/versions.tf

It also seems we are setting providers in other files, like for example: https://github.com/nebari-dev/nebari/blob/9b1310b33e89c2c11c3b39128ec792ca80342486/src/_nebari/stages/terraform_state/template/aws/main.tf#L23-L31

We need to ensure consistency across all stages by using the same provider versions. For now, I suggest we avoid updating to the latest available versions and instead stick to the most up-to-date versions among the ones we’re currently using.