nebari-dev / nebari

🪴 Nebari - your open source data science platform
https://nebari.dev
BSD 3-Clause "New" or "Revised" License
282 stars 93 forks source link

[RELEASE] 2024.11.1 - Hotfix #2807

Closed viniciusdc closed 3 days ago

viniciusdc commented 4 weeks ago

Release Checklist

Release details

Scheduled release date - 2024/11/01

Release captain responsible - @viniciusdc

Starting point - a new release is out

Looking forward - planning

Pre-release process

Cut the official release

If there were changes to the following packages, handle their releases before cutting a new release for Nebari

These steps must be actioned in the order they appear in this checklist.

viniciusdc commented 4 weeks ago

For the changes made to this release, we have this tracking issue https://github.com/nebari-dev/nebari/issues/2798

viniciusdc commented 3 weeks ago

Tested locally to evaluate fixes, waiting for user feedback

kenafoster commented 3 weeks ago

Upgrading 2024.7.1 to 2024.11.1rc1 - the prompt took me through the 2024.9.1 prompt before 11.1 - was this expected? I thought it was supposed to skip

(nebari-2024.11.1) (base) kfoster@Mac nebari_2024.11.1 % nebari --version
2024.11.1rc1
(nebari-2024.11.1) (base) kfoster@Mac nebari_2024.11.1 % nebari upgrade -c nebari-config.yaml 

---> Starting upgrade from 2024.7.1 to 2024.9.1

Setting nebari_version to 2024.9.1

Do you want to replace current tag 2024.7.1 with 2024.9.1 for:
default_images.jupyterhub: quay.io/nebari/nebari-jupyterhub? [Y/n]  (Y): y

Do you want to replace current tag 2024.7.1 with 2024.9.1 for:
default_images.jupyterlab: quay.io/nebari/nebari-jupyterlab? [Y/n]  (Y): y

Do you want to replace current tag 2024.7.1 with 2024.9.1 for:
default_images.dask_worker: quay.io/nebari/nebari-dask-worker? [Y/n]  (Y): y

---> Starting upgrade from 2024.9.1 to 2024.11.1

Setting nebari_version to 2024.11.1

Do you want to replace current tag 2024.9.1 with 2024.11.1 for:
default_images.jupyterhub: quay.io/nebari/nebari-jupyterhub? [Y/n]  (Y): y

Do you want to replace current tag 2024.9.1 with 2024.11.1 for:
default_images.jupyterlab: quay.io/nebari/nebari-jupyterlab? [Y/n]  (Y): y

Do you want to replace current tag 2024.9.1 with 2024.11.1 for:
default_images.dask_worker: quay.io/nebari/nebari-dask-worker? [Y/n]  (Y): y
kenafoster commented 3 weeks ago

Attempting 2024.9.1 -> 2024.11.1rc1 upgrade on an existing AWS cluster, I noticed this:

Would you like Nebari to assign the corresponding role/scopes to all of your current groups automatically? [y/N] (N): y
...
ValueError: Failed to connect to Keycloak server: 401: b'{"error":"invalid_grant","error_description":"Invalid user credentials"}'

The reason for this is that I have changed security.keycloak.initial_root_password from its value when Nebari was first deployed so that the true password isn't stored in config in our CICD repo. I believe this is a common (best?) practice, so relying on that value to connect to keycloak for the upgrade group creation step won't work.

viniciusdc commented 3 weeks ago

Upgrading 2024.7.1 to 2024.11.1rc1 - the prompt took me through the 2024.9.1 prompt before 11.1 - was this expected? I thought it was supposed to skip

This was expected the sense of skipping was merely in terms of not performing any action on behalf of the user, due to how the upgrade process works, there is not a skip process that we could use, something to enhance.

viniciusdc commented 3 weeks ago

The reason for this is that I have changed security.keycloak.initial_root_password from its value when Nebari was first deployed so that the true password isn't stored in config in our CICD repo. This is a common (best?) practice, so relying on that value to connect to keycloak for the upgrade group creation step won't work.

Uhm... indeed, that's something we are not considering in many of our tests, and as you commented, it is best practice. There is no easy way to acquire the new password programmatically, though, unless we request the user for it -- in case the attempt with the one in the yaml fails. What do you think?

We can also point the user on how to do this manually in case of an error (sounds like a good idea since it will be difficult to counter all the edge cases)

dcmcand commented 3 weeks ago

@kenafoster can you open a ticket about that issue? I don't think it is a release blocker, but it is a good callout and we should talk about how to best handle that situation in a separate issue. Thanks for all the work you are doing with testing!

kenafoster commented 3 weeks ago

Yep, here it is https://github.com/nebari-dev/nebari/issues/2833

viniciusdc commented 1 week ago

I have taken care of the issues mentioned above and thoroughly tested them.

Note: While performing the check, I encountered a weird issue with my cluster where it attempted to re-create all resources, I am assuming it was an issue with my stages folder not being in sync with the storage one, I was able to perform successfully deployments once I replicate the same action with clean stages files

While doing this process, I also took some time to think about how we want to perform the hotfixes in the future and what are some of the policies we should consider in case similar situation like this one appear.

viniciusdc commented 3 days ago

This is now completed. We did find an issue with the tagged docker image #2861, but that can be addressed without a re-release. I am closing this issue as completed