st-tech / gatling-operator

Automating distributed Gatling load testing using Kubernetes operator
MIT License
68 stars 21 forks source link

Modify Gatling Runner Pod’s Multi-Containers Structure to make gatling-runner run as main container #65

Closed yokawasa closed 1 year ago

yokawasa commented 1 year ago

Description

Summary

The PR includes the following 3 updates:

About Gatling Runner Pods multi-container structure update

What's achieved with the PR is that gatling-waiter runs as an init container and both gatling-runner and gatling-result-transferer should run as main containers in a Gatling Runner Pod.

Currently, as described in architecture diagram (see Gatling Operator Architecture and Design), gatling-waiter and gatling-runner container run as init containers and gatling-result-transferer as a main container in the case of generating an aggregated Gatling result report while gatling-runner runs as a main container in the case of not generating the report.

For easy Pod architecture comparison, here is before and after image

Before

After

Testing

Test1 - to see if gatling benchmark run and the report is generated as expected

  1. prepare testing Gatling CR where reporting is enabled

base gatling CR is this. here i enabled generateReport like this

config/samples/gatling-operator_v1alpha1_gatling01.yaml

apiVersion: gatling-operator.tech.zozo.com/v1alpha1
kind: Gatling
metadata:
  name: gatling-sample01
spec:
  generateReport: true
  generateLocalReport: false
  notifyReport: false
  cleanupAfterJobDone: false

apply the manifest and see the result


# apply the manifest
kustomize build config/samples  | kubectl apply -f -

# check the gatling cr status
kubectl get gatling gatling-sample01 -o jsonpath='{@.status}' |jqresult

See the report URL

kubectl get gatling gatling-sample01 -o jsonpath='{@.status}' |jq

{
  "reportCompleted": true,
  "reportUrl": "xxxx",
  "reporterJobName": "gatling-sample01-reporter",
  "reporterStartTime":1668405766,
  "runnerCompleted": true,
  "runnerJobName": "gatling-sample01-runner",
  "runnerStartTime": 1668405746,
  "succeeded": 3
}

then accessing reportUrl with your browser. the result page was generated successfully

Screen Shot 2022-11-13 at 21 39 06

Test2 - to see if gatling runner container run as main container

Here are dump output. As you can see the gatling runner container run as main container . More specifically gatling-waiter run as init container, and both gatling-runner & gatling-result-transferer as main containers

initcontainer

  initContainers:
  - args:
    - |
      PARALLELISM=1
      NAMESPACE=default
      JOB_NAME=gatling-sample02
      POD_NAME=$(cat /etc/pod-info/name)

      kubectl label pods -n $NAMESPACE $POD_NAME gatling-waiter=initialized

      while true; do
        READY_PODS=$(kubectl get pods -n $NAMESPACE --selector=job-name=$JOB_NAME-runner,gatling-waiter=initialized --no-headers | grep -c ".*");
        echo "$READY_PODS/$PARALLELISM pods are ready";
        if  [ $READY_PODS -eq $PARALLELISM ]; then
          break;
        fi;
        sleep 1;
      done
    command:
    - /bin/sh
    - -c
    image: bitnami/kubectl:1.21.8
    ...omit...

main containers

 containers:
  - args:  ############## gatling-runner
    - |
      SIMULATIONS_DIR_PATH=/opt/gatling/user-files/simulations
      TEMP_SIMULATIONS_DIR_PATH=/opt/gatling/user-files/simulations-temp
      RESOURCES_DIR_PATH=/opt/gatling/user-files/resources
      RESULTS_DIR_PATH=/opt/gatling/results
      START_TIME=""
      if [ -z "${START_TIME}" ]; then
        START_TIME=$(date +"%Y-%m-%d %H:%M:%S" --utc)
      fi
      start_time_stamp=$(date -d "${START_TIME}" +"%s")
      current_time_stamp=$(date +"%s")
      echo "Wait until ${START_TIME}"
      until [ ${current_time_stamp} -ge ${start_time_stamp} ];
      do
        current_time_stamp=$(date +"%s")
        echo "it's ${current_time_stamp} now and waiting until ${start_time_stamp} ..."
        sleep 1;
      done
      if [ ! -d ${SIMULATIONS_DIR_PATH} ]; then
        mkdir -p ${SIMULATIONS_DIR_PATH}
      fi
      if [ -d ${TEMP_SIMULATIONS_DIR_PATH} ]; then
        cp -p ${TEMP_SIMULATIONS_DIR_PATH}/*.scala ${SIMULATIONS_DIR_PATH}
      fi
      if [ ! -d ${RESOURCES_DIR_PATH} ]; then
        mkdir -p ${RESOURCES_DIR_PATH}
      fi
      if [ ! -d ${RESULTS_DIR_PATH} ]; then
        mkdir -p ${RESULTS_DIR_PATH}
      fi
      gatling.sh -sf ${SIMULATIONS_DIR_PATH} -s MyBasicSimulation -rsf ${RESOURCES_DIR_PATH} -rf ${RESULTS_DIR_PATH} -nr

      if [ $? -eq 0 ]; then
        touch ${RESULTS_DIR_PATH}/COMPLETED
      fi
    command:
    - /bin/sh
    - -c
    image: ghcr.io/st-tech/gatling:latest
    ...omit...
  - args: ############## gatling-result-transferer
    - |
      RESULTS_DIR_PATH=/opt/gatling/results
      rclone config create s3 s3 env_auth=true region ap-northeast-1
      while true; do
        if [ -f "${RESULTS_DIR_PATH}/COMPLETED" ]; then
          for source in $(find ${RESULTS_DIR_PATH} -type f -name *.log)
          do
            rclone copyto ${source} --s3-no-check-bucket --s3-env-auth s3:my-gatling-reports-0001/gatling-sample02/2391946291/${HOSTNAME}.log
          done
          break
        fi
        sleep 1;
      done
    command:
    - /bin/sh
    - -c
    image: rclone/rclone
    ...omit...

Test3 - to see if gatling-result-transferer container ends as quickly as gatling-runner container fails

  1. run the gatling runner pod in the same way as Testing1

  2. Then fails the gatling runner container while it's running

You can force the gatling runner container to fail with the following script

GATLING_NAME="gatling-sample01"
NAMESPACE="default"
JOB_NAME=$(kubectl get gatling ${GATLING_NAME} -n ${NAMESPACE} -o jsonpath='{@.status.runnerJobName}')
if [[ -n ${JOB_NAME} ]]; then
  PODS=$(kubectl get po -n ${NAMESPACE} |grep "${JOB_NAME}"|awk '{print $1}')
  for pod in ${PODS}
  do
     kubectl exec -it ${pod} -c gatling-runner -n ${NAMESPACE} -- \
    /bin/sh -c "kill -9 \$(ps awx|grep \"/opt/gatling/bin/gatling.sh\"|grep -v grep |awk '{print \$1}')"
   done
fi

It is observed that gatling-result-transferer container ends as quickly as gatling-runner container fails

Checklist

Relevant issue

yokawasa commented 1 year ago

@ksudate Thanks alot for the review!

I would go ahead to merge this if there is no more review comment within today