zeebe-io / benchmark-helm

Contains a helm chart to execute zeebe benchmarks
https://zeebe-io.github.io/benchmark-helm/
Apache License 2.0
3 stars 0 forks source link

Upgrade camunda platform helm chart #138

Closed Zelldon closed 9 months ago

Zelldon commented 9 months ago

Description

Our dependency on the Camunda platform was outdated, for quite a while. This is problematic since we are not testing features, we have built and also not the setup of Camunda platform helm chart anymore.

Changes

Several things have changed: the support of ES 8, migration to a different ES helm chart (bitnami), the curator was replaced by ILM, labels have changed, and no longer sub-charts in Camunda Platform (only sub-folders in the templates as of now).

The PR includes several adjustments to the values files to cover these changes.

Closes https://github.com/zeebe-io/benchmark-helm/issues/127 Closes https://github.com/zeebe-io/benchmark-helm/issues/126 Closes https://github.com/zeebe-io/benchmark-helm/issues/125

Security context:

Elasticsearch:

Furthermore, due to the sub-chart to sub-folder migration in the camunda platform charts, several path have changed, which we had to adjust in our golden tests.

Benchmark

Right now we have a benchmark running to verify whether everything works, especially ILM and elasticsearch setup etc.

https://grafana.dev.zeebe.io/d/zeebe-dashboard/zeebe?orgId=1&var-DS_PROMETHEUS=prometheus&var-cluster=All&var-namespace=ck-test-helm&var-pod=All&var-partition=All&from=1705065208717&to=1705067183123

2024-01-12_14-47

2024-01-12_14-47_1

Elasticsearch metrics can be found here https://grafana.dev.zeebe.io/d/elasticsearch/elasticsearch?orgId=1&var-datasource=prometheus&var-cluster=All&var-namespace=ck-test-helm&var-index=All

2024-01-12_14-46

Zelldon commented 9 months ago

I investigated further the ILM deletion, and thought first it doesn't work with the ILM settings which is why I made it possible to create hourly based indices https://github.com/camunda/zeebe/pull/15953

Turned out I was wrong. I have a running benchmark without changing the indices and the smaller disk size (16 gig) and it is running strong.

ck-helm-defaults-es-disk

Based on the logs we can see that the indexes need to go through different phases before getting deleted (I just selected on index for simplicity)

# Creation
[2024-01-16T19:59:00,325][INFO ][o.e.x.i.IndexLifecycleTransition] [zeebe-benchmark-test-elasticsearch-master-1] moving index [zeebe-record_variable_8.5.0-snapshot_2024-01-16] from [null] to [{"phase":"new","action":"complete","name":"complete"}] in policy [zeebe-record-retention-policy]
[2024-01-16T19:59:00,783][INFO ][o.e.c.r.a.AllocationService] [zeebe-benchmark-test-elasticsearch-master-1] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[zeebe-record_variable_8.5.0-snapshot_2024-01-16][0]]])." previous.health="YELLOW" reason="shards started [[zeebe-record_variable_8.5.0-snapshot_2024-01-16][0]]"

# start of deletion
[2024-01-16T20:18:59,809][INFO ][o.e.x.i.IndexLifecycleTransition] [zeebe-benchmark-test-elasticsearch-master-1] moving index [zeebe-record_variable_8.5.0-snapshot_2024-01-16] from [{"phase":"new","action":"complete","name":"complete"}] to [{"phase":"delete","action":"delete","name":"wait-for-shard-history-leases"}] in policy [zeebe-record-retention-policy]

[2024-01-16T20:28:59,772][INFO ][o.e.x.i.IndexLifecycleTransition] [zeebe-benchmark-test-elasticsearch-master-1] moving index [zeebe-record_variable_8.5.0-snapshot_2024-01-16] from [{"phase":"delete","action":"delete","name":"wait-for-shard-history-leases"}] to [{"phase":"delete","action":"delete","name":"cleanup-snapshot"}] in policy [zeebe-record-retention-policy]

# Now it gets deleted 30 min after creation
[2024-01-16T20:28:59,797][INFO ][o.e.x.i.IndexLifecycleTransition] [zeebe-benchmark-test-elasticsearch-master-1] moving index [zeebe-record_variable_8.5.0-snapshot_2024-01-16] from [{"phase":"delete","action":"delete","name":"cleanup-snapshot"}] to [{"phase":"delete","action":"delete","name":"delete"}] in policy [zeebe-record-retention-policy]
Zelldon commented 9 months ago

@npepinpe I think we can go ahead and merge this change. I will try tomorrow how the charts behaves when we upgrade from an older version of the chart to a newer version, based on the results we might want to pin the release charts to the older version for now.

Zelldon commented 9 months ago

I have a benchmark with the mixed setup (incl. Operate running) here

How to set up:

helm install zeebe-benchmark charts/zeebe-benchmark \
> --set starter.rate=5 \
        --set worker.replicas=1 \
        --set timer.replicas=1 \
        --set timer.rate=5 \
        --set publisher.replicas=1 \
        --set publisher.rate=5 \
        --set camunda-platform.operate.enabled=true \
        --set camunda-platform.operate.image.repository=camunda/operate \
        --set camunda-platform.operate.image.tag=SNAPSHOT    \
        --set camunda-platform.elasticsearch.master.persistence.size=128Gi \
        --set camunda-platform.zeebe.retention.minimumAge=1d \
        --set camunda-platform.operate.retention.minimumAge=30m