Closed NickLarsenNZ closed 6 days ago
:green_circle: airflow-scheduled-job
Anomalies during upgrade process:
sparkapp_dag
after the upgrade, the spark_pi_monitor
task is marked as failed, but didn't see any errors in the container logs and the SparkApplications in the cluster all had the status Succeeded
~Anomalies during clean installation of nightly version:
sparkapp_dag
also fails here on the spark_pi_monitor
task~sparkapp_dag
issue fixed by https://github.com/stackabletech/demos/pull/125
Anomalies during upgrade process: ... after increasing the memory resources, everything loaded
~@xeniape, is there a PR for the resource increases? I have seen the OOM problem before, and I believe @sbernauer resolved it with more resources.~ I see that the clean nightly deployment didn't have this problem, so I guess there is no PR required. But perhaps we need something in the release notes about resources needing bumping?
I have seen the OOM problem before, and I believe @sbernauer resolved it with more resources
Yes, but I can not find any commit for this any more... I would be in favor of bumping the memory, maybe even the default resources of airflow-operator. I have seen so many customer requests because of OOM, I would like to give the best experience to especially new users trying out demos (and playing around with them)
Findings during initial installation:
Findings after upgrade:
IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "_hyper_2_2_chunk_idx_scores_sr"
DETAIL: Key ("time")=(2024-11-12 08:02:01.188596+00) already exists.
It seems like there is a conflict in the data from the first run (before the upgrade). This is expected behaviour according to @adwk67. The notebook keeps running and produces data.
🟢 data-lakehouse-iceberg-trino-spark
I believe the demo to be a little flakey:
Attempt 3
stackablectl install demo
fineorg.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0
to 1.6.1
and 1.7.0
. This worked for @adwk67, but not for me - though Spark came right after I restarted pods and/or stopped and started the NiFi task.org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0
and deployed to verify which step above resolved the problem but this time Spark ran fine and the shared bikes dashboard showed data in Superset.🟢 nifi-kafka-druid-earthquake-data
Cluster environment: k3s v1.31.0-k3s1 (via k3d)
Upgraded to:
Notes:
kubectl exec -it kafka-broker-default-0 -c kcat-prober -- /bin/bash -c "/stackable/kcat -b localhost:9093 -X security.protocol=SSL -X ssl.key.location=/stackable/tls-kcat/tls.key -X ssl.certificate.location=/stackable/tls-kcat/tls.crt -X ssl.ca.location=/stackable/tls-kcat/ca.crt -L"
(and similar for the other kcat commands)~
"message": "org.apache.kafka.common.errors.TimeoutException: Timeout of 60000ms expired before the position for partition earthquakes-7 could be determined",
Notes:
🟢 jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data
🟢 spark-k8s-anomaly-detection-taxi-data
🟢 trino-iceberg
🟢 nifi-kafka-druid-water-level-data
Cluster environment: k3s v1.31.0-k3s1 (via k3d)
Upgraded to:
Notes:
stackablectl stacklet list
used old format for Kafkakubectl exec -it kafka-broker-default-0 -c kcat-prober -- /bin/bash -c "/stackable/kcat -b localhost:9093 -X security.protocol=SSL -X ssl.key.location=/stackable/tls-kcat/tls.key -X ssl.certificate.location=/stackable/tls-kcat/tls.crt -X ssl.ca.location=/stackable/tls-kcat/ca.crt -L"
(and similar for the other kcat commands)stations
row size is 0B, but if I view the data it still looks correct
Notes:
🟢 hbase-hdfs-load-cycling-data
docker.stackable.tech/stackable/hbase:2.4.18-stackable0.0.0-dev
locally.docker.stackable.tech/stackable/hadoop:3.3.6-stackable24.3.0
(due to https://github.com/stackabletech/docker-images/issues/793)distcp-cycling-data-x6zmf
create-hfile-and-import-to-hbase-cg7wg
🟡 end-to-end-security
Anomalies during upgrade process:
kinit
was needed after the pod restartcreate-spark-report.yaml
)Anomalies during clean installation of nightly version:
Thanks everyone for the help in getting the demos tested. This issue is now resolved. 🚀
Pre-Release Demo Testing on Nightly
Part of https://github.com/stackabletech/issues/issues/647
This is testing:
Replace the items in the task lists below with the applicable Pull Requests (if any).
Instructions
These instructions are for deploying the nightly demo, as well as upgrading the operators and CRDS.