v0.4.0 testing tracker - Githubissues

iameskild commented 2 years ago

Checklist:

Validate successful qhub deploy and qhub destroy for each provider:

[x] AWS
- [x] password
- [x] github
- [x] auth0
Validate the following services:
- [x] Log into keycloak as root user and add user
- [x] Add user from command line
- [x] Launch JupyterLab session with new user
- [x] Launch dask-cluster and test auto-scaler
- [x] Launch dask-gateway dashboard
- [x] Validate conda-store environments are created and available
- [x] Launch basic CDS Dashboard
- [x] Launch Grafana (validate SSO)
- [x] Qhub destroy (#1110)
[x] Azure
- [x] password
- [x] github
- [x] auth0
Validate the following services:
- [x] Log into keycloak as root user and add user
- [x] Add user from command line
- [x] Launch JupyterLab session with new user
- [x] Launch dask-cluster and test auto-scaler
- [x] Launch dask-gateway dashboard
- [x] Validate conda-store environments are created and available
- [x] Launch basic CDS Dashboard
- [x] Launch Grafana (validate SSO)
- [x] Qhub destroy
[ ] DO
- [x] password
- [x] github
- [x] auth0
Validate the following services:
- [x] Log into keycloak as root user and add user
- [x] Add user from command line
- [x] Launch JupyterLab session with new user
- [x] Launch dask-cluster and test auto-scaler
- [ ] Launch dask-gateway dashboard
- [x] Validate conda-store environments are created and available
- [x] Launch basic CDS Dashboard
- [x] Launch Grafana (validate SSO)
- [x] Qhub destroy
[x] GGP
- [x] password
- [x] github
- [x] auth0
Validate the following services:
- [x] Log into keycloak as root user and add user
- [x] Add user from command line
- [x] Launch JupyterLab session with new user
- [x] Launch dask-cluster and test auto-scaler
- [x] Launch dask-gateway dashboard
- [x] Validate conda-store environments are created and available
- [x] Launch basic CDS Dashboard
- [x] Launch Grafana (validate SSO)
- [x] Qhub destroy
[x] local/existing kubernetes cluster/minikube
- [x] password
- [x] github
- [x] auth0
Validate the following services:
- [x] Log into keycloak as root user and add user
- [x] Add user from command line
- [x] Launch JupyterLab session with new user
- [x] Launch dask-cluster and test auto-scaler
- [x] Launch dask-gateway dashboard
- [x] Validate conda-store environments are created and available
- [x] Launch basic CDS Dashboard
- [x] Launch Grafana (validate SSO)
- [x] Qhub destroy

Validate qhub upgrade is successful for each provider:

[ ] AWS
- [ ] Upgrade from v0.3.12/v0.3.13/v0.3.14 to v0.4.0
[ ] Azure
- [ ] Upgrade from v0.3.12/v0.3.13/v0.3.14 to v0.4.0
[ ] DO
- [ ] Upgrade from v0.3.12/v0.3.13/v0.3.14 to v0.4.0
[ ] GCP
- [ ] Upgrade from v0.3.12/v0.3.13/v0.3.14 to v0.4.0
[ ] local/existing kubernetes deployment/minikube
- [ ] Upgrade from v0.3.12/v0.3.13/v0.3.14 to v0.4.0

Validate qhub-ops.yaml workflow

[x] github-actions
[x] gitlab-ci

(outdated)

### nebari-dev/nebari#1003 Testing - Minikube deployment - AWS - Azure - Digital Ocean - GCP ### Keycloak - Use keycloak for user authentication, test: - [x] AWS - [X] password - [x] github - [x] auth0 - Azure - password - github - auth0 - [x] Digital Ocean - [x] password - [x] github - [x] auth0 - [x] GCP - [x] password - [x] github - [x] auth0 Azure deployments fail see nebari-dev/nebari#978 for more details. ### `qhub upgrade` AWS - [x] upgrade from qhub v0.3.12 to main - password auth - upgrade from qhub v0.3.12 to main - auth0 auth DO - [x] upgrade from qhub v0.3.12 to main - password auth - [x] upgrade from qhub v0.3.12 to main - auth0 auth

iameskild commented 2 years ago

I've been having trouble upgrading from 0.3.12 on AWS (using Auth0) to the version of qhub on main (ie. export QHUB_GH_BRANCH=main). On the deploy step, the error I keep running into is the following:

[terraform]: │ Error: Get "http://localhost/api/v1/namespaces/dev": dial tcp [::1]:80: connect: connection refused

I've seen errors like this in past but I haven't been able to get around it. @danlester do you have any idea why this might be failing or if there are additional steps I need to take?

danlester commented 2 years ago

@iameskild Not too sure, but we can have a call if you want to look together.

iameskild commented 2 years ago

@danlester I've attempted another upgrade with the same results. I will try to perform an upgrade from 0.3.13 to main for another cloud provider and see if I get it working. I'm free to jump on a call whenever is convenient for you, thanks for you help!

danlester commented 2 years ago

I don't think there will be much difference, but I would suggest also trying 0.3.12 to main for another cloud provider, so you're changing less for comparison.

It could also be worth trying with password instead of auth0 to see if that works - I have done most testing under password.

iameskild commented 2 years ago

@danlester I was able to upgrade from 0.3.12 to 0.4.0 (main) running on DO using password. I made the following adjustments

no changing the underlying general node instance type
reinstalled qhub (bumped version to v0.4.0) into qhub-main conda env
manually updated the image tags to v0.3.14

Unfortunately the hub pod never came back up. This made it so I couldn't test importing existing users or verify that the user data is still intact.

hub pod logs:

``` Loading /usr/local/etc/jupyterhub/secret/values.yaml No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml Loading extra config: jupyterhub_extra_config [E 2022-01-11 04:12:52.994 JupyterHub app:2973] Traceback (most recent call last): File "/opt/conda/lib/python3.7/site-packages/jupyterhub/app.py", line 2970, in launch_instance_async await self.initialize(argv) File "/opt/conda/lib/python3.7/site-packages/jupyterhub/app.py", line 2461, in initialize self.load_config_file(self.config_file) File "/opt/conda/lib/python3.7/site-packages/traitlets/config/application.py", line 88, in inner return method(app, *args, **kwargs) File "/opt/conda/lib/python3.7/site-packages/traitlets/config/application.py", line 777, in load_config_file raise_config_file_errors=self.raise_config_file_errors, File "/opt/conda/lib/python3.7/site-packages/traitlets/config/application.py", line 738, in _load_config_files config = loader.load_config() File "/opt/conda/lib/python3.7/site-packages/traitlets/config/loader.py", line 614, in load_config self._read_file_as_dict() File "/opt/conda/lib/python3.7/site-packages/traitlets/config/loader.py", line 646, in _read_file_as_dict exec(compile(f.read(), conf_filename, 'exec'), namespace, namespace) File "/usr/local/etc/jupyterhub/jupyterhub_config.py", line 446, in exec(config_py) File "", line 28, in ImportError: cannot import name 'theme_extra_handlers' from 'qhub_jupyterhub_theme' (/opt/conda/lib/python3.7/site-packages/qhub_jupyterhub_theme/__init__.py) stream closed ```

danlester commented 2 years ago

@iameskild This is the same problem that Vini faced: https://github.com/Quansight/qhub/pull/967#issuecomment-1005712132

I'm not too sure why you manually updated the image tags to v0.3.14. The qhub upgrade should have already set them to v0.3.14 - but only if they started off as v0.3.12 in the qhub-config.yaml file. Ultimately, when qhub (Python module) has its internal version number at v0.4.0, qhub upgrade should end up at v0.4.0 for the image tags instead.

But since the qhub repo doesn't yet have a v0.4.0 tag, no corresponding images exist in Docker Hub, so you would really need to (manually) use main as the image tag to get the versions based on our latest code.

If you still have the broken site running, try updating the image tags in qhub-config.yaml and redeploy - it will still be a helpful test I think.

Still happy to have a call to go through all of this together.

iameskild commented 2 years ago

Redeploying with image tags set to main resolves this issue. After importing the users and logging in, the user data remains intact :)

I still want to go back and test upgrading a QHub instance that uses Auth0.

iameskild commented 2 years ago

Upgrading qhub (on AWS, using Auth0) from v0.3.12 to v0.4.0 failed during the deployment process. I tried the same upgrade and deploy on DO and while it successfully deployed and I could import users, I could login due to the following: Screen Shot 2022-01-12 at 22 46 11

I also noticed a few bizarre Terraform outputs:

``` [terraform]: Note: Objects have changed outside of Terraform [terraform]: [terraform]: Terraform detected the following changes made outside of Terraform since the [terraform]: last "terraform apply": [terraform]: [terraform]: # module.kubernetes-conda-store-mount.kubernetes_persistent_volume_claim.main previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.kubernetes-conda-store-mount.kubernetes_storage_class.main previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.kubernetes-conda-store-mount.kubernetes_persistent_volume.main previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.forwardauth.kubernetes_deployment.forwardauth-deployment previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.forwardauth.kubernetes_service.forwardauth-service previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_deployment.gateway previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_config_map.gateway previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_cluster_role.gateway previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_service.gateway previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_secret.gateway previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_config_map.controller previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_service_account.gateway previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_cluster_role_binding.gateway previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_cluster_role.controller previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_cluster_role_binding.controller previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_service_account.controller previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-dask-gateway.kubernetes_deployment.controller previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.kubernetes-conda-store-server.kubernetes_config_map.conda-environments previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.kubernetes-conda-store-server.kubernetes_deployment.main previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.kubernetes-conda-store-server.kubernetes_persistent_volume_claim.main previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.kubernetes-conda-store-server.kubernetes_service.main previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-jupyterhub-ssh.kubernetes_deployment.jupyterhub-sftp previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-jupyterhub-ssh.kubernetes_deployment.jupyterhub-ssh previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-jupyterhub-ssh.kubernetes_secret.jupyterhub-sftp previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-jupyterhub-ssh.kubernetes_secret.jupyterhub-ssh previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-jupyterhub-ssh.kubernetes_service.jupyterhub-sftp previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-jupyterhub-ssh.kubernetes_service.jupyterhub-ssh previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.qhub.module.kubernetes-jupyterhub-ssh.kubernetes_config_map.jupyterhub-ssh previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.kubernetes-nfs-server.kubernetes_deployment.main previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.kubernetes-nfs-server.kubernetes_persistent_volume_claim.main previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" [terraform]: # module.kubernetes-nfs-server.kubernetes_service.main previous run state doesn't conform to current schema; this is a Terraform bug [terraform]: # unsupported attribute "self_link" ```

@danlester are you available to troubleshoot together tomorrow after the QHub sync?

iameskild commented 2 years ago

@danlester capturing the Terraform logs led me to:

Invalid provider configuration was supplied. Provider operations likely to fail: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable:

Googling this, I found an issue on the terraform-aws-eks repo. Here, one on of the top recommendations was https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1234#issuecomment-787936210.

export KUBE_CONFIG_PATH=/Users/eskild/.kube/config

With this trick, the deployment seemed to be working but then it started deleting subnet resources and errored out, leaving the cluster in an half-deleted state.

Logs in this gist.

danlester commented 2 years ago

@iameskild I believe I've solved this particular problem (Terraform trying to access localhost cluster) in the following issue which gives more details. It has a corresponding PR - please review:

Kubeconfig state unavailable, Terraform defaults to localhost

However, (in AWS) it leads me to the problem you were seeing about subnet resources being replaced. (Some outputs below). Once it wants to replace the node groups, the apply will never finish since the nodes can't be destroyed until the cluster has its contents removed safely.

By the way, I tried the upgrade on AWS and got the same localhost error using password auth (not Auth0) - I don't think the auth type has anything to do with it, and you were just lucky if you got password upgrade to work before - or maybe something has changed since!

As discussed, the login problem you saw with Auth0 above is because the callback URL needs to be changed, and we need to advise the user in qhub upgrade - issue nebari-dev/nebari#991 for you.

Terraform AWS subnet replacement logs

``` [terraform]: # module.kubernetes.aws_eks_cluster.main must be replaced [terraform]: -/+ resource "aws_eks_cluster" "main" { [terraform]: ~ arn = "arn:aws:eks:eu-west-2:892486800165:cluster/qhubawsdslc-dev" -> (known after apply) [terraform]: ~ certificate_authority = [ [terraform]: - { [terraform]: - data = "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM1ekNDQWMrZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeU1ERXhOREE1TlRFd05Wb1hEVE15TURFeE1qQTVOVEV3TlZvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTVFUCjFnclQ0UFhZb1FwUXpGTVY3WHZ5bGtQdjg3ZTA3Nyt2SzhTZnpndkNmRFRrcVFLV093bDk1UmQ2RmtOS2JvMWQKY2cwL1ZSSVBkcWlBS0liVlBLWnVlcDE1UGEyUHNjNEZaRG5EMXdKd1BlMllPQWlmS2p3M1Z0dkxHRHVJZnF3Zgo0eHA5cm5IZHl6MytMMGdQaGZXaTZ3R3NZeUxJbmt0VUg2YzdGYlQzaUplbUEwTVI4dGVRZGVMaklac3BoMk9zCkNPMzFiYi84bEVrRlBZS2paZDhMNE9kM3ZzcnM2cURhbVh6ZWhCWDJpZlY1bWJuOW5iWnlLWGNXTTFaQmdaN2sKQ1ZxdjJLbytZaWZCeWlzTm15RHNpZjBMQW1Obk5Fa1RNclJudlREUmEvWGxSdkVJT3hBT054ZGdSU1JHekFsMApkOTJGSVhTUlpLaG1oS2VHM21VQ0F3RUFBYU5DTUVBd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZMejlIUWtvSTEweTF0a0xYc0t4MUlWVW50aGdNQTBHQ1NxR1NJYjMKRFFFQkN3VUFBNElCQVFBYStjN2NkRCt0Qi9WblNlK1FJN2pyYnZ4WHNHbHVaRzZ4czRVdUluWXdTemFBRUU0egpOcVRybXh2WWJJYm43QUpGTU9jVW9pVTNFekVOMi85K1MrQ1VlWHp6WUdwOWFvUWR6NFR0Nlp5L2VxK083dFAzCnBTcUJnaTFRY1Z0MUdNcHNQYTBQS1dCVzl1TzVCZ3FmUE52UUVrWStab3dWZEJ4ZW9EL2Evb2hCSEgxSHFEbloKN2tGbTVXR2tOdHJabUU1ZUt0K1ExWjV6dWcvNGhpemc0UTdrMC9kR2FwcDlVaFN5ejQvaFpCOVp5VDNmanpFNgp6SnlWVHIyUHlRbzZma3BpSmJ2U2pqTlBDaFdVdXN6MzRZTVJFRDFFbjA3SVA5WDlaY1ZyUWpXajIyVmpBNnhoCkl6d1FmVmwzQnZXb3JsRFBHMVlZMDUzbzJWbXNkVVR5UFRXeQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==" [terraform]: }, [terraform]: ] -> (known after apply) [terraform]: ~ created_at = "2022-01-14 09:45:08.526 +0000 UTC" -> (known after apply) [terraform]: - enabled_cluster_log_types = [] -> null [terraform]: ~ endpoint = "https://6E72EBCB3DA4098E9A8F2EC38D1146A0.gr7.eu-west-2.eks.amazonaws.com" -> (known after apply) [terraform]: ~ id = "qhubawsdslc-dev" -> (known after apply) [terraform]: ~ identity = [ [terraform]: - { [terraform]: - oidc = [ [terraform]: - { [terraform]: - issuer = "https://oidc.eks.eu-west-2.amazonaws.com/id/6E72EBCB3DA4098E9A8F2EC38D1146A0" [terraform]: }, [terraform]: ] [terraform]: }, [terraform]: ] -> (known after apply) [terraform]: name = "qhubawsdslc-dev" [terraform]: ~ platform_version = "eks.4" -> (known after apply) [terraform]: ~ status = "ACTIVE" -> (known after apply) [terraform]: tags = { [terraform]: "Environment" = "dev" [terraform]: "Name" = "qhubawsdslc-dev" [terraform]: "Owner" = "terraform" [terraform]: "Project" = "qhubawsdslc" [terraform]: } [terraform]: ~ version = "1.21" -> (known after apply) [terraform]: # (2 unchanged attributes hidden) [terraform]: [terraform]: ~ kubernetes_network_config { [terraform]: ~ service_ipv4_cidr = "172.20.0.0/16" -> (known after apply) [terraform]: } [terraform]: [terraform]: ~ vpc_config { [terraform]: ~ cluster_security_group_id = "sg-007603ed388248252" -> (known after apply) [terraform]: ~ public_access_cidrs = [ [terraform]: - "0.0.0.0/0", [terraform]: ] -> (known after apply) [terraform]: ~ subnet_ids = [ [terraform]: - "subnet-0063ecc4391bfbca7", [terraform]: - "subnet-0aede967b72f0907b", [terraform]: ] -> (known after apply) # forces replacement [terraform]: ~ vpc_id = "vpc-09af9d5ed405a83ed" -> (known after apply) [terraform]: # (3 unchanged attributes hidden) [terraform]: } [terraform]: } [terraform]: [terraform]: # module.kubernetes.aws_eks_node_group.main[0] must be replaced [terraform]: -/+ resource "aws_eks_node_group" "main" { [terraform]: ~ arn = "arn:aws:eks:eu-west-2:892486800165:nodegroup/qhubawsdslc-dev/general/14bf2c01-b6a2-2c2a-2b93-c9c96a608e41" -> (known after apply) [terraform]: ~ capacity_type = "ON_DEMAND" -> (known after apply) [terraform]: ~ disk_size = 20 -> 50 # forces replacement [terraform]: ~ id = "qhubawsdslc-dev:general" -> (known after apply) [terraform]: - labels = {} -> null [terraform]: + node_group_name_prefix = (known after apply) [terraform]: ~ release_version = "1.21.5-20220112" -> (known after apply) [terraform]: ~ resources = [ [terraform]: - { [terraform]: - autoscaling_groups = [ [terraform]: - { [terraform]: - name = "eks-general-14bf2c01-b6a2-2c2a-2b93-c9c96a608e41" [terraform]: }, [terraform]: ] [terraform]: - remote_access_security_group_id = "" [terraform]: }, [terraform]: ] -> (known after apply) [terraform]: ~ status = "ACTIVE" -> (known after apply) [terraform]: ~ subnet_ids = [ [terraform]: - "subnet-0063ecc4391bfbca7", [terraform]: - "subnet-0aede967b72f0907b", [terraform]: ] -> (known after apply) # forces replacement [terraform]: tags = { [terraform]: "Environment" = "dev" [terraform]: "Owner" = "terraform" [terraform]: "Project" = "qhubawsdslc" [terraform]: "kubernetes.io/cluster/qhubawsdslc-dev" = "shared" [terraform]: } [terraform]: ~ version = "1.21" -> (known after apply) [terraform]: # (6 unchanged attributes hidden) [terraform]: [terraform]: [terraform]: ~ update_config { [terraform]: ~ max_unavailable = 1 -> (known after apply) [terraform]: ~ max_unavailable_percentage = 0 -> (known after apply) [terraform]: } [terraform]: # (1 unchanged block hidden) [terraform]: } [terraform]: [terraform]: # module.kubernetes.aws_eks_node_group.main[1] must be replaced [terraform]: -/+ resource "aws_eks_node_group" "main" { [terraform]: ~ arn = "arn:aws:eks:eu-west-2:892486800165:nodegroup/qhubawsdslc-dev/user/06bf2c01-b6a2-7df4-2dd9-085b9d1c86fa" -> (known after apply) [terraform]: ~ capacity_type = "ON_DEMAND" -> (known after apply) [terraform]: ~ disk_size = 20 -> 50 # forces replacement [terraform]: ~ id = "qhubawsdslc-dev:user" -> (known after apply) [terraform]: - labels = {} -> null [terraform]: + node_group_name_prefix = (known after apply) [terraform]: ~ release_version = "1.21.5-20220112" -> (known after apply) [terraform]: ~ resources = [ [terraform]: - { [terraform]: - autoscaling_groups = [ [terraform]: - { [terraform]: - name = "eks-user-06bf2c01-b6a2-7df4-2dd9-085b9d1c86fa" [terraform]: }, [terraform]: ] [terraform]: - remote_access_security_group_id = "" [terraform]: }, [terraform]: ] -> (known after apply) [terraform]: ~ status = "ACTIVE" -> (known after apply) [terraform]: ~ subnet_ids = [ [terraform]: - "subnet-0063ecc4391bfbca7", [terraform]: - "subnet-0aede967b72f0907b", [terraform]: ] -> (known after apply) # forces replacement [terraform]: tags = { [terraform]: "Environment" = "dev" [terraform]: "Owner" = "terraform" [terraform]: "Project" = "qhubawsdslc" [terraform]: "kubernetes.io/cluster/qhubawsdslc-dev" = "shared" [terraform]: } [terraform]: ~ version = "1.21" -> (known after apply) [terraform]: # (6 unchanged attributes hidden) [terraform]: [terraform]: [terraform]: ~ update_config { [terraform]: ~ max_unavailable = 1 -> (known after apply) [terraform]: ~ max_unavailable_percentage = 0 -> (known after apply) [terraform]: } [terraform]: # (1 unchanged block hidden) [terraform]: } [terraform]: [terraform]: # module.kubernetes.aws_eks_node_group.main[2] must be replaced [terraform]: -/+ resource "aws_eks_node_group" "main" { [terraform]: ~ arn = "arn:aws:eks:eu-west-2:892486800165:nodegroup/qhubawsdslc-dev/worker/f2bf2c01-b695-d752-dff8-df37ec98f1ab" -> (known after apply) [terraform]: ~ capacity_type = "ON_DEMAND" -> (known after apply) [terraform]: ~ disk_size = 20 -> 50 # forces replacement [terraform]: ~ id = "qhubawsdslc-dev:worker" -> (known after apply) [terraform]: - labels = {} -> null [terraform]: + node_group_name_prefix = (known after apply) [terraform]: ~ release_version = "1.21.5-20220112" -> (known after apply) [terraform]: ~ resources = [ [terraform]: - { [terraform]: - autoscaling_groups = [ [terraform]: - { [terraform]: - name = "eks-worker-f2bf2c01-b695-d752-dff8-df37ec98f1ab" [terraform]: }, [terraform]: ] [terraform]: - remote_access_security_group_id = "" [terraform]: }, [terraform]: ] -> (known after apply) [terraform]: ~ status = "ACTIVE" -> (known after apply) [terraform]: ~ subnet_ids = [ [terraform]: - "subnet-0063ecc4391bfbca7", [terraform]: - "subnet-0aede967b72f0907b", [terraform]: ] -> (known after apply) # forces replacement [terraform]: tags = { [terraform]: "Environment" = "dev" [terraform]: "Owner" = "terraform" [terraform]: "Project" = "qhubawsdslc" [terraform]: "kubernetes.io/cluster/qhubawsdslc-dev" = "shared" [terraform]: } [terraform]: ~ version = "1.21" -> (known after apply) [terraform]: # (6 unchanged attributes hidden) [terraform]: [terraform]: [terraform]: ~ update_config { [terraform]: ~ max_unavailable = 1 -> (known after apply) [terraform]: ~ max_unavailable_percentage = 0 -> (known after apply) [terraform]: } [terraform]: # (1 unchanged block hidden) [terraform]: } ```

danlester commented 2 years ago

I think it's something to do with CIDR changes:

[terraform]:   # module.network.aws_subnet.main[0] must be replaced
[terraform]: -/+ resource "aws_subnet" "main" {
[terraform]:       ~ arn                             = "arn:aws:ec2:eu-west-2:892486800165:subnet/subnet-0aede967b72f0907b" -> (known after apply)
[terraform]:       ~ availability_zone_id            = "euw2-az2" -> (known after apply)
[terraform]:       ~ cidr_block                      = "10.10.0.0/20" -> "10.10.0.0/18" # forces replacement
[terraform]:       ~ id                              = "subnet-0aede967b72f0907b" -> (known after apply)
[terraform]:       + ipv6_cidr_block_association_id  = (known after apply)

I would take a look where these have been changed (e.g. vpc_cidr_newbits and vpc_cidr_block in the code), find out why, and see if they can at least be preserved for old installations.

viniciusdc commented 2 years ago

@iameskild just to keep in mind during tests

The grafana dashboard needs a valid email address to be defined in the user config nebari-dev/nebari-docs#333
The dask gateway dashboards are indeed working, but due to trafik auto-generated cert not being trusted, the tls access to the dashboard link returns 404, using an accepted cert fixes it, can be include following this

iameskild commented 2 years ago

CICD workflows have been tested and a PR for the relevant bug fixes/modifications has been opened: nebari-dev/nebari#1086

viniciusdc commented 2 years ago

Azure issues seem in integration tests does not affect fresh local deployments

viniciusdc commented 2 years ago

@danlester @HarshCasper Have you tested the qhub upgrade command for the above version migrations? just to know if that still needs to be tested :smile:

iameskild commented 2 years ago

v0.4.0 released. Closing issue 🙌

nebari-dev / nebari

v0.4.0 testing tracker #976

Checklist:

Validate the following services:

Validate the following services:

Validate the following services:

Validate the following services:

Validate the following services:

(outdated)