I moved the timing of cleaning-up job after having at least one loop, and after all the Gatling job completed. This is because of the following my assumptions:
cleaning up the gatling job resources just after notification message has been sent may causes some timing issue which consequently causes the gatling status update failure.
just after notification message has been sent, and not before the gatling status update (gatling.Status.NotificationCompleted = true) has completed, a next loop has came, thus another notification message has been sent ( = duplicated message issue).
Test
I've actually made the same change to the operator in Nov 11th and deployed it to a testing environment. Ever since then, I haven't seen the same issue in the environment.
I'm not 100% sure but from the several days observation in the testing environment, it looks like the issue has been fixed with this update.
Description
fix for the issue #9
The changes I've made are the following
No 2 fix above isn't directly for fixing the issue #9. It's just to have a single loop before moving to next stage to avoid a some timing issue.
What I made the No1 change to fix the issue?
Any time duplicate message issue occurs, I see the following gatling CR update error.
the relevant part in the operator source code is this:
https://github.com/st-tech/gatling-operator/blob/2be50da0642f21d66f9e4d766216e5a8d55c8bca/controllers/gatling_controller.go#L386-L389
Just after this part, the Gatling operator cleans up the gatling job resources. the relevant part: https://github.com/st-tech/gatling-operator/blob/2be50da0642f21d66f9e4d766216e5a8d55c8bca/controllers/gatling_controller.go#L118-L130
I moved the timing of cleaning-up job after having at least one loop, and after all the Gatling job completed. This is because of the following my assumptions:
Test
I've actually made the same change to the operator in Nov 11th and deployed it to a testing environment. Ever since then, I haven't seen the same issue in the environment. I'm not 100% sure but from the several days observation in the testing environment, it looks like the issue has been fixed with this update.