qameta / allure-testops-deployment

Helm Charts for Allure TestOps Deployment
8 stars 22 forks source link

A lot of restarts of the report service, because JDBC connection timeout, because we have a lot of threads #102

Open polarnik opened 1 week ago

polarnik commented 1 week ago

Hello!

Default thread pool settings are not the best:

report:
  replicaCount: 1
  cache:
    enabled: false
    ttlByDefault: 1m
    ttlWidgetsAutomationTrend: 1h
    ttlWidgetsProjectMetricTrend: 1h
    ttlWidgetsLaunchDurationHistogram: 1h
    ttlWidgetsAnalyticPieChart: 1h
    ttlWidgetsTrComplexTrend: 1h
    ttlWidgetsTrStatisticTrend: 1h
    ttlWidgetsTcLastResult: 1h
  image: allure-report
  maxDBConn: 10
  maxConcurrency: 5
  maxS3Concurrency: 200
  taskExecutorCorePoolSize: 200

The source: https://github.com/qameta/allure-testops-deployment/blob/master/charts/allure-testops/values.yaml

200 threads rabbitConnectionFactorySharedExecutor (taskExecutorCorePoolSize: 200) can not work with 10 JDBC connections (maxDBConn: 10), only. Allure health check will not get a free connection, and Kubernetes will restart the service, because the service is not healthy.

And 200 threads rabbitConnectionFactorySharedExecutor will allocate all CPU resources in PostgreSQL.

The best settings are:

  maxDBConn: 50
  maxConcurrency: 5
  taskExecutorCorePoolSize: 10
polarnik commented 1 week ago

For example, we used big taskExecutorCorePoolSize (rabbitConnectionFactorySharedExecutor)

image

We had 100% CPU Usage in PostgreSQL and the service restarts:

image

Take a look on the day: 08/15

The config was updated on 08/21:

  maxDBConn: 50
  maxConcurrency: 5
  taskExecutorCorePoolSize: 10

The service is working well, without restarts and without 100% CPU Usage in PostgreSQL