Closed sadsfae closed 2 weeks ago
Given we only run this out of cron on Monday through Saturday:
-=>>crontab -l | grep foreman
0 */3 * * 1-6 ( echo "=== Foreman RBAC == @ " $(date) ; flock -n /tmp/foremanrbac.lock -c "quads --foreman-rbac" ) 1>/dev/null 2>&1
and new schedules go out on Sundays, maybe we want to also run the cron on Sundays as well? Also, for the hour, */3
means it runs at 0, 3, 6, 9, 12, 15, 18, and 21. The schedules typically start at 22 hours, so the next check won't run until 2 hours after the schedules start, and our grace period for starting the check is 30 minutes after the schedule starts. So with the validation check running at:
*/20 * * * * ( echo "=== Validating == @ " $(date) ; flock -n /tmp/validateenv.lock -c "quads --validate-env" ) 1>>/var/log/validate-env.log 2>&1
it means the next validation run will start at 22:40 and be after the end of the grace period. This is still well before the next rbac run. Depending on the cost of running quads --foreman-rbac
, I think we can fix this by setting the rbac cron to run more frequently with:
5 */2 * * * ( echo "=== Foreman RBAC == @ " $(date) ; flock -n /tmp/foremanrbac.lock -c "quads --foreman-rbac" ) 1>/dev/null 2>&1
which should have the rbac run to trigger at 22:05, well before the next validation run.
@kambiz-aghaiepour running more frequently wouldn't be a bad thing but I think we should also run a pre-emptive quads --foreman-rbac
too somewhere that makes sense. Let's just go with the cron adjustments though for now.
I'd want to discuss with @grafuls on how to handle this, but likely we'd want an assignment table field in the database to ensure it's run only once and then just run it out of move_and_rebuild
. Like the notification labels this can also be reset when the assignment is reclaimed.
Currently this message gets generated here in the notify_failure
method: https://github.com/redhat-performance/quads/blob/latest/src/quads/tools/validate_env.py#L43 and it will always false-positive unless we're exactly spot on with cron runs (which seems like a bandaid).
If we had an one-time-per-release --foreman-rbac
call in M&R before VE gets processed it will avoid getting the failure message entirely rather than increase the frequency from cron.
At that point we may be able to just completely remove the cron.
Lastly, we want a check that we're not in grace period before notify_failure
gets processed anyway: https://github.com/redhat-performance/quads/blob/latest/src/quads/tools/validate_env.py#L43
e.g.
if time_delta.total_seconds() // 60 > Config["validation_grace_period"]:
https://github.com/redhat-performance/quads/blob/latest/conf/quads.yml#L154
pending patchset: https://review.gerrithub.io/c/redhat-performance/quads/+/1203774
This is a much older and minor issue but the validation check notification email is always sent.
Validation check failed for cloud61 / wfoster / 3885
We should not be sending any notification emails from
validate_environment
phases while in the configurable grace window.Fix
I think that we should run an pre-emptive
quads --foreman-rbac
here as in testing this makes sure the message is actually relevant (just tested this with another env)