sassoftware / viya4-deployment

This project contains Ansible code that creates a baseline in an existing Kubernetes environment for use with the SAS Viya Platform, generates the manifest for an order, and then can also deploy that order into the Kubernetes environment specified.
Apache License 2.0
70 stars 64 forks source link

Manual update from crunchy v4 to v5 not working #425

Closed bek-afs closed 1 year ago

bek-afs commented 1 year ago

Hello,

I am attempting to upgrade from 2022.09 LTS to 2023.03 Stable. I used this viya4-deployment repo to install SAS Viya initially, and have been using it for the attempted upgrade (using Release 6.4.0). To upgrade, I ran a similar docker command, and instead supplied the new 2023.03 deployment assets, license, and updated the ansible-vars.yaml accordingly. I also updated our mirror registry with the new images. I used the "viya, install" tags.

The first time I ran the deployment in attempts to upgrade, I received the error:

TASK [vdm : postgres instance - crunchy v4 to crunchy v5 update is required] *** fatal: [localhost]: FAILED! => changed=false msg: A manual update from crunchy v4 to v5 is required, follow these steps https://go.documentation.sas.com/doc/en/itopscdc/default/dplynotes/p1bpcvd3sr8au8n1w9ypcvu31taj.htm

I navigated to the link and followed the instructions provided. At a high level, based on the instructions, I deleted an old postgres transformer we were previously using, and included the crunchy-storage-transformer.yaml with the specs based on the existing crunchy v4 cluster, and included the crunchy-upgrade-platform-transformer.yaml as customizations. I did not include overlays/crunchydata/postgres-operator, overlays/postgres/platform-postgres, or components/crunchydata/internal-platofmr-postgres because I saw them included as a part of this task.

However, when I tried running the upgrade again after making these changes, I got the same error. I found the merge request related to this upgrade and saw @dhoucgitter's comment "DAC will exit with "A manual update from crunchy v4 to v5 is required..." if an order with internal v5 crunchy configured is deployed to a cluster where internal v4 crunchy is already present."

Does this mean that I have to do a complete uninstall and then reinstall to get past this crunchy v4 to v5 upgrade? I'm not sure what else I'm supposed to do if the linked documentation doesn't seem to get me past the error. Please advise - thanks!

dhoucgitter commented 1 year ago

Hi @bek-afs, a complete uninstall and reinstall should not be required, although if you don't need to preserve your 2022.09 LTS environment, it could be faster than completing the crunchy 4->5 upgrade steps.

DAC looks for and interprets the presence of the existing v4 crunchy operator as a sign that the upgrade from Crunchy 4 to Crunchy 5 is not yet complete. If you have completed all seven steps of the "Before Deployment Commands" of the upgrade process and also completed the steps under "Finish Upgrading Crunchy Data" did you also go back and perform the Manual deployment steps listed here: https://go.documentation.sas.com/doc/en/itopscdc/v_033/dplyml0phy0dkr/p127f6y30iimr6n17x2xe9vlt54q.htm with a site.yaml file built after completing the Crunchy 4->5 upgrade steps? The crunchy 4->5 upgrade steps in the DAC message link are intended to be followed during a manual upgrade before running the deployment commands that I've referred you to in the link above.

There are seven manual deployment steps listed at the link above starting with:

On the kubectl machine, create the Kubernetes manifest:

kustomize build -o site.yaml

The crunchy 4 operator should be removed automatically during application of your updated manifest, specifically during step 5:

kubectl apply -f site.yaml -l sas.com/admin=namespace --prune

bek-afs commented 1 year ago

Hi @dhoucgitter, thanks for your response! Firstly, I'm not able to access the link you provided - is this section on Deployment Using Kubernetes Commands what you were referring to?

If so, unfortunately I don't think I'll be able to complete those steps with our current instance. We've automated this deployment one step further by creating an AWS CodeBuild job that dynamically creates the ansible-vars.yaml based on the environment we're deploying into, builds the image from this repo, and performs the docker run command with that image. In this current setup, the $deploy directory isn't maintained/saved since this all runs in a temporary CodeBuild job container. It was my hope that we could get away with this kind of architecture, and purely supply new deploymentAssets on upgrades (upgrades in this scenario being to re-run the docker run command with the viya,install tags). In this model we're still able to provide any custom overlays, and we're able to perform kubectl commands for troubleshooting/admin work as needed. The way we've built it also eliminates the need for a jump server and in general keeping long standing config files on server. Instead all code is stored as IaC.

It's unfortunate that there's still a need for manual deployment steps even with the use of this repo. We might be able to export the $deploy directory from CodeBuild and copy them locally (or reintroduce a jump server) to perform administrative tasks like this. However, I think ideally we'd stick with our setup. Based on your expertise, if we were to do a full uninstall and reinstall to upgrade versions as an ongoing procedure, would there be a need to retain the $deploy directory? Do you see any major hurdles or challenges we'd have with doing full uninstalls and reinstalls as opposed to in-place upgrades in terms of impacting end-users or functionality of the SAS Viya application?

I appreciate your thoughts on the situation, as well as your contributions to this repo!

dhoucgitter commented 1 year ago

Sure thing, replying to your initial question, yes, the section Deployment using Kubernetes Commands is an up-to-date version of what I pointed you to. I updated the link in the post above to correct it.

bek-afs commented 1 year ago

Hi @dhoucgitter, thanks for the updated link. Are you able to provide a response on the other questions I had in my last message?

dhoucgitter commented 1 year ago
  1. If we were to do a full uninstall and reinstall to upgrade versions as an ongoing procedure, would there be a need to retain the $deploy directory? Retaining the $deploy directory would allow you to refer to and reuse any of the site-config customizations that you may have used in the previous deployment. The kustomization.yaml file and the entries within it are also useful to understand the configuration used in your previous deployment.
  2. Do you see any major hurdles or challenges we'd have with doing full uninstalls and reinstalls as opposed to in-place upgrades in terms of impacting end-users or functionality of the SAS Viya application? If your re-installed environment maintains or adds to the features of the one you are upgrading from, I would not expect an impact to functionality. If an end user has taken time to import data and or code into an existing deployment, an upgrade might allow them to pick up where they left off more easily instead of needing to gain access to data and code with a newly installed deployment. Utilizing backup and restore could be a way to mitigate that.
bek-afs commented 1 year ago

Hi @dhoucgitter, thanks for your response! It sounds like any sort of work an end user has done in the UI would be lost by performing an uninstall/reinstall, which makes me think that would be an undesirable result. Unfortunately, according to the docs, it appears that restores can only be applied to the same cadence and version, so I don't think we'd be able to uninstall/reinstall the new version and then apply the backup.

As a side note, I was able to get past the crunchy error by performing a full uninstall/reinstall with the new version, which was really efficient using this repo and saved me a ton of time of making all the crunchy related updates. As a future enhancement request, it'd be awesome if it was possible to perform this uninstall/reinstall with new version process and then apply the backup to preserve the previous environment.

bek-afs commented 1 year ago

If you have completed all seven steps of the "Before Deployment Commands" of the upgrade process and also completed the steps under "Finish Upgrading Crunchy Data" did you also go back and perform the Manual deployment steps listed here: https://go.documentation.sas.com/doc/en/itopscdc/v_033/dplyml0phy0dkr/p127f6y30iimr6n17x2xe9vlt54q.htm with a site.yaml file built after completing the Crunchy 4->5 upgrade steps?

Hi @dhoucgitter, sorry just one more follow-up question on this thread and then I think it can be closed out! I'm reading through SAS documentation again and see the note in the Deployment Using Kubernetes Commands section that "If you have deployed the SAS Viya Platform Deployment Operator, these commands are not necessary since the operator deploys your software for you". In my case, we are using the deployment operator, which means we'd recreate the SASDeployment Custom Resource and then deploy using the deployment operator instead of using the manual deployment steps, correct?

dhoucgitter commented 1 year ago

Hi @bek-afs, just to confirm what you are asking. You are at the point in the upgrade process where you would normally run the manual deployment steps, but in your case, you have chosen to use the Deployment Operator to manage your deployment. If so, then yes, you would follow the steps to deploy using the deployment operator instead of using the manual deployment steps since the deployment operator will execute those commands on your behalf.

bek-afs commented 1 year ago

Hi @dhoucgitter, yes you interpreted my question correctly! However, we would still have to recreate the SASDeployment Custom Resource to specify the new license, cadence version, cadence release, etc and then deploy using the deployment operator, correct? If so, I think that answers my questions and you can close this issue out. Thanks again for your help!

dhoucgitter commented 1 year ago

@bek-afs, yes, that's correct, you would regenerate a new custom resource definition after handling the cadence specific items from the deployment notes and then deploy using the deployment operator.