stackabletech / demos

This repo contains SDP stacks and demos
https://docs.stackable.tech/home/stable/demos/
Apache License 2.0
0 stars 2 forks source link

Check demos and upgrade from 24.3 to 24.7 release #65

Closed maltesander closed 1 month ago

maltesander commented 1 month ago

Description

For each demo: test the upgrading process from 24.3 to 24.7 release and document necessary changes for upgrading if necessary. Also check the steps in the demo documentation to ensure the proper functionality of the demo with the 24.7 release.

PR for changes

### Tasks
- [ ] https://github.com/stackabletech/demos/pull/60
- [x] airflow-scheduled-job upgrade and functionality test @maltesander
- [x] hbase-hdfs-load-cycling-data upgrade and functionality test @maltesander
- [x] end-to-end-security upgrade and functionality test @maltesander
- [x] nifi-kafka-druid-earthquake-data upgrade and functionality test @maltesander
- [x] nifi-kafka-druid-water-level-data upgrade and functionality test @maltesander
- [x] spark-k8s-anomaly-detection-taxi-data upgrade and functionality test @maltesander
- [x] trino-iceberg upgrade and functionality test @NickLarsenNZ
- [x] trino-taxi-data upgrade and functionality test @maltesander
- [x] data-lakehouse-iceberg-trino-spark upgrade and functionality test @xeniape
- [x] jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data upgrade and functionality test @Techassi
- [x] logging upgrade and functionality test @Techassi
- [x] signal-processing upgrade and functionality test @Techassi
maltesander commented 1 month ago

:green_circle: airflow-scheduled-jobs

Upgrade 23.4 -> 23.7 worked.

Had to uninstall postgres and redis (new versions) manually

# Clone demos repository and checkout release-24.3 branch (/tmp)
git clone git@github.com:stackabletech/demos.git
git checkout release-24.3

# Clone release and use 24.3
git clone git@github.com:stackabletech/release.git

# point local stacks file to local folder
stackablectl demo in airflow-scheduled-job --demo-file /tmp/demos/demos/demos-v2.yaml --stack-file /tmp/demos/stacks/stacks-v2.yaml --release-file /tmp/release/releases.yaml

# uninstall release
stackablectl release uninstall 24.3

# replace crds
kubectl replace -f https://raw.githubusercontent.com/stackabletech/airflow-operator/release-24.7/deploy/helm/airflow-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/release-24.7/deploy/helm/commons-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/listener-operator/release-24.7/deploy/helm/listener-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/release-24.7/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/release-24.7/deploy/helm/spark-k8s-operator/crds/crds.yaml

# postgres gets a new version: 6: release postgresql-airflow (13.2.18) already installed, skipping requested version 15.5.16
helm uninstall postgresql-airflow

# redis gets a new version:
helm uninstall redis-airflow: 6: release redis-airflow (18.1.6) already installed, skipping requested version 19.6.1

# will use 24.7 release and demo / stack files
stackablectl demo in airflow-scheduled-job
maltesander commented 1 month ago

:green_circle: hbase-hdfs-load-cycling-data

# Clone demos repository and checkout release-24.3 branch (/tmp)
git clone git@github.com:stackabletech/demos.git
git checkout release-24.3

# Clone release and use 24.3
git clone git@github.com:stackabletech/release.git

# point local stacks file to local folder (and adapt files to be releative)
stackablectl demo in hbase-hdfs-load-cycling-data --demo-file /tmp/demos/demos/demos-v2.yaml --stack-file /tmp/demos/stacks/stacks-v2.yaml --release-file /tmp/release/releases.yaml

stackablectl release uninstall 24.3

# update crds
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/release-24.7/deploy/helm/commons-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/listener-operator/release-24.7/deploy/helm/listener-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/release-24.7/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hbase-operator/release-24.7/deploy/helm/hbase-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hdfs-operator/release-24.7/deploy/helm/hdfs-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/release-24.7/deploy/helm/zookeeper-operator/crds/crds.yaml

# copy the secret `secret-provisioner-tls-ca` to the operator's namespace
kubectl get secrets secret-provisioner-tls-ca --output=yaml | \
    sed 's/namespace: .*/namespace: stackable-operators/' | \
    kubectl create --filename=-

# install 24.7
stackablectl demo in hbase-hdfs-load-cycling-data

Results in errors in the operator, but works after deleting the statefulsets:

2024-07-25T09:46:58.862709Z ERROR hdfs_controller: stackable_operator::logging::controller: Failed to reconcile object controller.name="hdfsclusters.hdfs.stackable.tech" error=reconciler for object HdfsCluster.v1alpha1.hdfs.stackable.tech/hdfs.default fa
iled error.sources=[cannot create role group stateful set "hdfs-namenode-default", failed to apply patch, unable to patch resource "hdfs-namenode-default", ApiError: StatefulSet.apps "hdfs-namenode-default" is invalid: spec: Forbidden: updates to statefu
lset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden: Invalid (ErrorResponse { status: "Failure", message: "StatefulSet.apps \"hdfs-namenode-defau
lt\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden", reason: "Invalid", code: 422 }), Statef
ulSet.apps "hdfs-namenode-default" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden: Invalid]

3.3.6 -> 3.4.0 (not supported at the moment, so expected)
namenode File system image contains an old layout version -66.
namenode An upgrade to version -67 is required.                                                                                 
namenode Please restart NameNode with the "-rollingUpgrade started" option if a rolling upgrade is already started; or restart NameNode with the "-upgrade" option to start a new upgrade.

# Reapplied jobs fail (folders existing etc.)

EDIT: Installing from main / 24.7 leads to failing distcp job (due to removing map reduce from docker image)

Error: Unable to initialize main class org.apache.hadoop.tools.DistCp                                                                                                                                                                
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/Job

Works with https://github.com/stackabletech/demos/pull/69

maltesander commented 1 month ago

:green_circle: end-to-end-security

No upgrade required. But requires https://github.com/stackabletech/demos/pull/67 for krb5 version (1.21.1) fix.

Works with that fix.

maltesander commented 1 month ago

:green_circle: nifi-kafka-druid-earthquake-data :green_circle: nifi-kafka-druid-water-level-data

# Clone demos repository and checkout release-24.3 branch (/tmp)
git clone git@github.com:stackabletech/demos.git
# switch branch to 24.3
git checkout release-24.3

# Clone release and use 24.3
git clone git@github.com:stackabletech/release.git

# point local stacks file to local folder (and adapt files to be releative instead then github raw)
stackablectl demo in nifi-kafka-druid-earthquake-data --demo-file /tmp/demos/demos/demos-v2.yaml --stack-file /tmp/demos/stacks/stacks-v2.yaml --release-file /tmp/release/releases.yaml

# uninstall operators
stackablectl release uninstall 24.3

# update crds
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/release-24.7/deploy/helm/commons-operator/crds/crds.yaml
kubectl create -f https://raw.githubusercontent.com/stackabletech/listener-operator/release-24.7/deploy/helm/listener-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/release-24.7/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/druid-operator/release-24.7/deploy/helm/druid-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/kafka-operator/release-24.7/deploy/helm/kafka-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/nifi-operator/release-24.7/deploy/helm/nifi-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/superset-operator/release-24.7/deploy/helm/superset-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/release-24.7/deploy/helm/zookeeper-operator/crds/crds.yaml

# 6: release postgresql-superset (13.2.18) already installed, skipping requested version 15.5.16
# 6: release postgresql-druid (13.2.18) already installed, skipping requested version 15.5.16
# 6: release postgresql-superset (13.2.18) already installed, skipping requested version 15.5.16
helm upgrade minio minio/minio --version 5.2.0
helm upgrade postgresql-druid bitnami/postgresql --version 15.5.16
helm upgrade postgresql-superset bitnami/postgresql --version 15.5.16

# replace druid.yaml in stack for fix from PR
stackablectl demo in nifi-kafka-druid-earthquake-data --stack-file stacks/stacks-v2.yaml

Requires https://github.com/stackabletech/demos/pull/67 for druid db credentials fix.

Techassi commented 1 month ago

🟢 signal-processing

# Clone demos repository and checkout release-24.3 branch (/tmp)
git clone git@github.com:stackabletech/demos.git
git checkout release-24.3

# Clone release and use 24.3
git clone git@github.com:stackabletech/release.git

# point local stacks file to local folder
stackablectl demo in signal-processing --demo-file /tmp/demos/demos/demos-v2.yaml --stack-file /tmp/demos/stacks/stacks-v2.yaml --release-file /tmp/release/releases.yaml

# uninstall release
stackablectl release uninstall 24.3

# replace crds
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/release-24.7/deploy/helm/commons-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/nifi-operator/release-24.7/deploy/helm/nifi-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/release-24.7/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/release-24.7/deploy/helm/zookeeeper-operator/crds/crds.yaml

# install latest
stackablectl demo in signal-processing
Techassi commented 1 month ago

🟢 logging

# Clone demos repository and checkout release-24.3 branch (/tmp)
git clone git@github.com:stackabletech/demos.git
git checkout release-24.3

# Clone release and use 24.3
git clone git@github.com:stackabletech/release.git

# point local stacks file to local folder
stackablectl demo in logging --demo-file /tmp/demos/demos/demos-v2.yaml --stack-file /tmp/demos/stacks/stacks-v2.yaml --release-file /tmp/release/releases.yaml

# uninstall release
stackablectl release uninstall 24.3

# replace crds
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/release-24.7/deploy/helm/commons-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/release-24.7/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/release-24.7/deploy/helm/zookeeper-operator/crds/crds.yaml

# delete setup-opensearch-dashboards job

# uninstall helm releases
helm uninstall vector-aggregator

# install latest
stackablectl demo in signal-processing
NickLarsenNZ commented 1 month ago

🟢 trino-iceberg

Loosely following: https://github.com/stackabletech/demos/issues/59#issuecomment-2236602426

# Clone demos repository and checkout release-24.3 branch (/tmp)
stackablectl demo install trino-iceberg --demo-file demos/demos-v2.yaml --stack-file stacks/stacks-v2.yaml

# FAILED: docker.stackable.tech/stackable/opa:0.66.0-stackable24.3.0 fails to pull
# Have to edit the stack/demo refs to not use main.

# upgrade postgres chart
helm upgrade postgresql-hive-iceberg bitnami/postgresql --version 15.5.16
helm upgrade minio minio/minio --version 5.2.0

# uninstall operators
stackablectl release uninstall 24.3

# update crds
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/release-24.7/deploy/helm/commons-operator/crds/crds.yaml
kubectl create -f https://raw.githubusercontent.com/stackabletech/listener-operator/release-24.7/deploy/helm/listener-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/release-24.7/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hive-operator/release-24.7/deploy/helm/hive-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/trino-operator/release-24.7/deploy/helm/trino-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/opa-operator/release-24.7/deploy/helm/opa-operator/crds/crds.yaml

# install 24.7 version of operators
stackablectl demo install trino-iceberg # if changes are needed: --stack-file stacks/stacks-v2.yaml

# If changes are needed, checkout a branch for fixes
git checkout main
git pull
git checkout -b fix/release-24.7-trino-iceberg

stackablectl demo install trino-iceberg --demo-file=demos/demos-v2.yaml --stack-file stacks/stacks-v2.yaml

# If the fixes are needed in main too, cherry pick the relevant ones
...
Other errors During the 24.3 install, I got errors ([slack thread](https://stackable-workspace.slack.com/archives/C07CHJLB8GG/p1721990873590709)): ``` 6: ApiError: failed to create typed patch object (/opa; opa.stackable.tech/v1alpha1, Kind=OpaCluster): .spec.servers.roleGroups.default.selector: field not declared in schema: (ErrorResponse { status: "Failure", message: "failed to create typed patch object (/opa; opa.stackable.tech/v1alpha1, Kind=OpaCluster): .spec.servers.roleGroups.default.selector: field not declared in schema", reason: "", code: 500 }) 7: failed to create typed patch object (/opa; opa.stackable.tech/v1alpha1, Kind=OpaCluster): .spec.servers.roleGroups.default.selector: field not declared in schema: ``` Trying with: ```diff diff --git a/stacks/trino-iceberg/trino.yaml b/stacks/trino-iceberg/trino.yaml index e9c3f08..f7afdac 100644 --- a/stacks/trino-iceberg/trino.yaml +++ b/stacks/trino-iceberg/trino.yaml @@ -103,10 +103,7 @@ spec: productVersion: 0.57.0 servers: roleGroups: - default: - selector: - matchLabels: - kubernetes.io/os: linux + default: {} --- apiVersion: v1 kind: ConfigMap ``` This looks like a leftover from the 24.3 release: See https://github.com/stackabletech/operator-rs/pull/652 (thanks @sbernauer)

tpch.sf5 doesn't appear in DBeaver. I have checked the OPA policies and see nothing that should stop that. I don't think it is worthy of halting the release though.

maltesander commented 1 month ago

:green_circle: spark-k8s-anomaly-detection-taxi-data

works with https://github.com/stackabletech/demos/pull/67


# point local stacks file to local folder (and adapt files to be releative instead then github raw)
stackablectl demo in spark-k8s-anomaly-detection-taxi-data --demo-file /tmp/demos/demos/demos-v2.yaml --stack-file /tmp/demos/stacks/stacks-v2.yaml --release-file /tmp/release/releases.yaml

# uninstall operators
stackablectl release uninstall 24.3

# update crds
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/release-24.7/deploy/helm/commons-operator/crds/crds.yaml
kubectl create -f https://raw.githubusercontent.com/stackabletech/listener-operator/release-24.7/deploy/helm/listener-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/release-24.7/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/release-24.7/deploy/helm/spark-k8s-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hive-operator/release-24.7/deploy/helm/hive-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/opa-operator/release-24.7/deploy/helm/opa-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/superset-operator/release-24.7/deploy/helm/superset-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/trino-operator/release-24.7/deploy/helm/trino-operator/crds/crds.yaml

# to make it work for reinstall
# helm upgrade minio minio/minio --version 5.2.0
# helm upgrade postgresql-hive bitnami/postgresql --version 15.5.16
# helm upgrade postgresql-hive-iceberg bitnami/postgresql --version 15.5.16
# helm upgrade postgresql-superset bitnami/postgresql --version 15.5.16

kubectl delete jobs create-spark-anomaly-detection-job
kubectl delete jobs setup-superset
kubectl delete sparkapplications spark-ad

# use local files in stacks for fixes
stackablectl demo in spark-k8s-anomaly-detection-taxi-data --stack-file stacks/stacks-v2.yaml
``
Techassi commented 1 month ago

:green_circle: jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data

Gets stuck on starting the hdfs-datanode-default-0 and hdfs-namenode-default-0 pods. This seems to be a local issue.

Edit Malte: Could not reproduce, works as expected (there was a change from hadoop 3.3.6 to 3.3.4, which maybe made the difference?)

maltesander commented 1 month ago

🟢 trino-taxi-data

works with https://github.com/stackabletech/demos/pull/67

# point local stacks file to local folder (and adapt files to be releative instead then github raw)
stackablectl demo in trino-taxi-data --demo-file /tmp/demos/demos/demos-v2.yaml --stack-file /tmp/demos/stacks/stacks-v2.yaml --release-file /tmp/release/releases.yaml

# uninstall operators
stackablectl release uninstall 24.3

# update crds
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/main/deploy/helm/commons-operator/crds/crds.yaml
kubectl create -f https://raw.githubusercontent.com/stackabletech/listener-operator/main/deploy/helm/listener-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/main/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hive-operator/main/deploy/helm/hive-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/trino-operator/main/deploy/helm/trino-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/superset-operator/main/deploy/helm/superset-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/opa-operator/main/deploy/helm/opa-operator/crds/crds.yaml

# to make it work for reinstall (or patch product versions)
# helm upgrade minio minio/minio --version 5.2.0
# helm upgrade postgresql-hive bitnami/postgresql --version 15.5.16
# helm upgrade postgresql-superset bitnami/postgresql --version 15.5.16

kubectl delete jobs create-ny-taxi-data-table-in-trino
kubectl delete jobs setup-superset
kubectl delete jobs load-ny-taxi-data

# use local files in stacks for fixes
stackablectl demo in trino-taxi-data --stack-file stacks/stacks-v2.yaml
xeniape commented 1 month ago

🟢 data-lakehouse-iceberg-trino-spark

During install:

During upgrade:

After upgrade:

# checkout repos temporarily in tmp folder
git clone git@github.com:stackabletech/demos.git
git checkout release-24.3
git clone git@github.com:stackabletech/release.git

# adapt files to be relative and point to local folders instead of using github raw
# point to those local files during install of demo
# add listener-operator to data-lakehouse stack
stackablectl demo in data-lakehouse-iceberg-trino-spark --demo-file demos/demos/demos-v2.yaml --stack-file demos/stacks/stacks-v2.yaml --release-file release/releases.yaml

# list of errors while installing this see above

# uninstall operators
stackablectl release uninstall 24.3

# update crds
kubectl replace -f https://raw.githubusercontent.com/stackabletech/commons-operator/main/deploy/helm/commons-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/listener-operator/main/deploy/helm/listener-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/secret-operator/main/deploy/helm/secret-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/hive-operator/main/deploy/helm/hive-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/kafka-operator/main/deploy/helm/kafka-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/nifi-operator/main/deploy/helm/nifi-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/opa-operator/main/deploy/helm/opa-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/spark-k8s-operator/main/deploy/helm/spark-k8s-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/superset-operator/main/deploy/helm/superset-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/trino-operator/main/deploy/helm/trino-operator/crds/crds.yaml
kubectl replace -f https://raw.githubusercontent.com/stackabletech/zookeeper-operator/main/deploy/helm/zookeeper-operator/crds/crds.yaml

# to make it work for reinstall (or patch product versions)
helm upgrade minio minio/minio --version 5.2.0
helm upgrade postgresql-hive bitnami/postgresql --version 15.5.16
helm upgrade postgresql-superset bitnami/postgresql --version 15.5.16
helm upgrade postgresql-hive-iceberg bitnami/postgresql --version 15.5.16

# install latest
stackablectl demo in data-lakehouse-iceberg-trino-spark

# list of errors while upgrading see above