mercedes-benz / sechub

SecHub provides a central API to test software with different security tools.
https://mercedes-benz.github.io/sechub/
MIT License
259 stars 61 forks source link

Prevent massive job execution in cluster on same time #107

Closed de-jcup closed 4 years ago

de-jcup commented 4 years ago

Problem

With #100 we are already trying to mitigate problems apearing when multiple pods starting same operations on projects.

But this is a general problem: When we want to have clustering and a scalable application where we just start another server /POD we will multiply the problem, because at the beginning e.g. 3 servers are sending requests to nearly same time to a security product. If the product is not stable enough or does not assume to handle similar requests at the same time, this can be problematic. When there is a scaling -e.g. to 6 Servers - we will have the problem just more often...

SecHub1---->Job1 execute->ProductA
SecHub2---->Job2 execute->ProductA
SecHub3---->Job3 execute->ProductA
+--------------^-

Solution

SecHub1------------->Job1 execute->ProductA
SecHub2-------------------->Job2 execute->ProductA
SecHub3---->Job3 execute->ProductA
+--------------^------^-----^

Potential solution 1

We simply start servers with a random delay

Potential solution 2

We avoid scheduling starting on same time in cluster by defining spring scheduling in different way, using no longer cron but instead initialDelay (with a random value) and fixedDelay

Potential solution 3

We avoid scheduling starting on same time in cluster by defining spring scheduling in different way, using no longer cron but instead initialDelay (with a value from database) and fixedDelay

de-jcup commented 4 years ago

Here an example snippet showing how server starts can be customized respectivley the SECHUB_CONFIG_TRIGGER_NEXTJOB_INITIALDELAY be set with random values having 303 milliseconds min time shift

amountOn303=$(( $SECHUB_CONFIG_TRIGGER_NEXTJOB_DELAY /303))
echo "SECHUB_CONFIG_TRIGGER_NEXTJOB_DELAY /303   = $amountOn303"

# randomize initial delay used for trigger next job, so we got
# time shift between different pods - we use 303 milliseconds as min diff between pods
#
# So when having SECHUB_CONFIG_TRIGGER_NEXTJOB_DELAY with 10.000 (is default) we got amountOf303 with33
# which means min 303 millis, max 33*303=9999 milis
export SECHUB_CONFIG_TRIGGER_NEXTJOB_INITIALDELAY=$(( $(shuf -i 1-$amountOn303 -n 1) * 303 ))
echo "SECHUB_CONFIG_TRIGGER_NEXTJOB_INITIALDELAY = $SECHUB_CONFIG_TRIGGER_NEXTJOB_INITIALDELAY";