Fix duplicate slack message issue

Description

fix for the issue #9

The changes I've made are the following

Move the timing of cleaning-up job after at least one loop, and after all the Gatling job completed https://github.com/st-tech/gatling-operator/commit/2d81b9b1408205f4782d24f50bcfc031b2ef01a6
Add a single loop (requeue) after a single reconciliation loop successfully done https://github.com/st-tech/gatling-operator/commit/8eb0f1a85255986986d1efb7cd70b8577b560724

No 2 fix above isn't directly for fixing the issue #9. It's just to have a single loop before moving to next stage to avoid a some timing issue.

What I made the No1 change to fix the issue?

Any time duplicate message issue occurs, I see the following gatling CR update error.

2021-11-10T12:23:24.509Z  ERROR controller-runtime.manager.controller.gatling.gatling.Reconcile Failed to update gatling status, and requeue  {"reconciler group": "gatling-operator.tech.zozo.com", "reconciler kind": "Gatling", "name": "zozo-aggregation-api", "namespace": "default", "error": "Operation cannot be fulfilled on gatlings.gatling-operator.tech.zozo.com \"zozo-aggregation-api\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/go-logr/zapr.(*zapLogger).Error
  /go/pkg/mod/github.com/go-logr/zapr@v0.2.0/zapr.go:132
github.com/st-tech/gatling-operator/controllers.(*GatlingReconciler).gatlingNotificationReconcile
  /workspace/controllers/gatling_controller.go:392
github.com/st-tech/gatling-operator/controllers.(*GatlingReconciler).Reconcile
  /workspace/controllers/gatling_controller.go:113
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
  /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
  /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.2
  /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:216
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
  /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
  /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
  /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
  /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
  /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
  /go/pkg/mod/k8s.io/apimachinery@v0.20.2/pkg/util/wait/wait.go:99

the relevant part in the operator source code is this:

https://github.com/st-tech/gatling-operator/blob/2be50da0642f21d66f9e4d766216e5a8d55c8bca/controllers/gatling_controller.go#L386-L389

// Implementation of reconciler logic for the notification
func (r *GatlingReconciler) gatlingNotificationReconcile(ctx context.Context, req ctrl.Request, gatling *gatlingv1alpha1.Gatling, log logr.Logger) (bool, error) {
    var reportURL = "none"
    // Get cloud storage info only if gatling.spec.generateReport is true
    if gatling.Spec.GenerateReport {
        _, url, err := r.getCloudStorageInfo(ctx, gatling)
        if err != nil {
            log.Error(err, "Failed to get gatling storage info, and requeue")
            return true, err
        }
        reportURL = url
    }
    if err := r.sendNotification(ctx, gatling, reportURL); err != nil {
        log.Error(err, "Failed to sendNotification, but and requeue")
        return true, err
    }
    // Update gatling status on notification
/////////////////////////////////////////////////////////////////////////////////////////////
    gatling.Status.NotificationCompleted = true
    if err := r.updateGatlingStatus(ctx, gatling); err != nil {
        log.Error(err, "Failed to update gatling status, and requeue")
        return true, err
    }
/////////////////////////////////////////////////////////////////////////////////////////////
    log.Info("Notification has successfully been sent!")
    return false, nil
}

Just after this part, the Gatling operator cleans up the gatling job resources. the relevant part: https://github.com/st-tech/gatling-operator/blob/2be50da0642f21d66f9e4d766216e5a8d55c8bca/controllers/gatling_controller.go#L118-L130

I moved the timing of cleaning-up job after having at least one loop, and after all the Gatling job completed. This is because of the following my assumptions:

cleaning up the gatling job resources just after notification message has been sent may causes some timing issue which consequently causes the gatling status update failure.
just after notification message has been sent, and not before the gatling status update (gatling.Status.NotificationCompleted = true) has completed, a next loop has came, thus another notification message has been sent ( = duplicated message issue).

Test

I've actually made the same change to the operator in Nov 11th and deployed it to a testing environment. Ever since then, I haven't seen the same issue in the environment. I'm not 100% sure but from the several days observation in the testing environment, it looks like the issue has been fixed with this update.

st-tech / gatling-operator