Closed viniciusdc closed 3 days ago
For the changes made to this release, we have this tracking issue https://github.com/nebari-dev/nebari/issues/2798
Tested locally to evaluate fixes, waiting for user feedback
Upgrading 2024.7.1 to 2024.11.1rc1 - the prompt took me through the 2024.9.1 prompt before 11.1 - was this expected? I thought it was supposed to skip
(nebari-2024.11.1) (base) kfoster@Mac nebari_2024.11.1 % nebari --version
2024.11.1rc1
(nebari-2024.11.1) (base) kfoster@Mac nebari_2024.11.1 % nebari upgrade -c nebari-config.yaml
---> Starting upgrade from 2024.7.1 to 2024.9.1
Setting nebari_version to 2024.9.1
Do you want to replace current tag 2024.7.1 with 2024.9.1 for:
default_images.jupyterhub: quay.io/nebari/nebari-jupyterhub? [Y/n] (Y): y
Do you want to replace current tag 2024.7.1 with 2024.9.1 for:
default_images.jupyterlab: quay.io/nebari/nebari-jupyterlab? [Y/n] (Y): y
Do you want to replace current tag 2024.7.1 with 2024.9.1 for:
default_images.dask_worker: quay.io/nebari/nebari-dask-worker? [Y/n] (Y): y
---> Starting upgrade from 2024.9.1 to 2024.11.1
Setting nebari_version to 2024.11.1
Do you want to replace current tag 2024.9.1 with 2024.11.1 for:
default_images.jupyterhub: quay.io/nebari/nebari-jupyterhub? [Y/n] (Y): y
Do you want to replace current tag 2024.9.1 with 2024.11.1 for:
default_images.jupyterlab: quay.io/nebari/nebari-jupyterlab? [Y/n] (Y): y
Do you want to replace current tag 2024.9.1 with 2024.11.1 for:
default_images.dask_worker: quay.io/nebari/nebari-dask-worker? [Y/n] (Y): y
Attempting 2024.9.1 -> 2024.11.1rc1 upgrade on an existing AWS cluster, I noticed this:
Would you like Nebari to assign the corresponding role/scopes to all of your current groups automatically? [y/N] (N): y
...
ValueError: Failed to connect to Keycloak server: 401: b'{"error":"invalid_grant","error_description":"Invalid user credentials"}'
The reason for this is that I have changed security.keycloak.initial_root_password
from its value when Nebari was first deployed so that the true password isn't stored in config in our CICD repo. I believe this is a common (best?) practice, so relying on that value to connect to keycloak for the upgrade group creation step won't work.
Upgrading 2024.7.1 to 2024.11.1rc1 - the prompt took me through the 2024.9.1 prompt before 11.1 - was this expected? I thought it was supposed to skip
This was expected the sense of skipping was merely in terms of not performing any action on behalf of the user, due to how the upgrade process works, there is not a skip process that we could use, something to enhance.
The reason for this is that I have changed security.keycloak.initial_root_password from its value when Nebari was first deployed so that the true password isn't stored in config in our CICD repo. This is a common (best?) practice, so relying on that value to connect to keycloak for the upgrade group creation step won't work.
Uhm... indeed, that's something we are not considering in many of our tests, and as you commented, it is best practice. There is no easy way to acquire the new password programmatically, though, unless we request the user for it -- in case the attempt with the one in the yaml fails. What do you think?
We can also point the user on how to do this manually in case of an error (sounds like a good idea since it will be difficult to counter all the edge cases)
@kenafoster can you open a ticket about that issue? I don't think it is a release blocker, but it is a good callout and we should talk about how to best handle that situation in a separate issue. Thanks for all the work you are doing with testing!
Yep, here it is https://github.com/nebari-dev/nebari/issues/2833
I have taken care of the issues mentioned above and thoroughly tested them.
Keycloak API calls now run correctly during nebari upgrade
; added the ability to override the keycloak root password by env vars;
launch_template
field has been temporarily disabled due to issues presented #2832, a full fix will be introduced in 2024.11.2
Tested upgrade path from 2024.7.1 -> 2024.11.1 and 2024.9.1 -> 2024.11.1;
Tested fresh deployment and upgrade deployments
Tested GPU usage since the ami_id logic is coupled with the instance_type classes;
Tested user creation, groups, and spawning of user instances. All the other resources were left untouched; thus, the previous tests done for 2024.9.1 apply.
Note: While performing the check, I encountered a weird issue with my cluster where it attempted to re-create all resources, I am assuming it was an issue with my stages folder not being in sync with the storage one, I was able to perform successfully deployments once I replicate the same action with clean stages files
While doing this process, I also took some time to think about how we want to perform the hotfixes in the future and what are some of the policies we should consider in case similar situation like this one appear.
This is now completed. We did find an issue with the tagged docker image #2861, but that can be addressed without a re-release. I am closing this issue as completed
Release Checklist
Release details
Scheduled release date - 2024/11/01
Release captain responsible - @viniciusdc
Starting point - a new release is out
Looking forward - planning
bugs
to determine what be should included in the release and add it to the milestone.Pre-release process
dask
versions in thenebari-dask
?nebari upgrade
for this releasegit cherry-pick
the commits that should be included.RELEASE.md
notes.Cut the official release
If there were changes to the following packages, handle their releases before cutting a new release for Nebari
nebari-workflow-controller
argo-jupyter-scheduler
These steps must be actioned in the order they appear in this checklist.
nebari-dask
meta package on Conda-Forge.CURRENT_RELEASE
(and any other tags) in theconstants.py
v
to tag.RELEASE.md
.nebari
on Conda-Forge.main