Closed taldcroft closed 8 years ago
Good catch. This makes sense and is a clean fix. Was the interval set to 2 days with unconscious awareness of this? As in is the 2 day limit still good or should it be 1 with the fix?
A 2-day limit is still fine from the operational perspective of the DSN file being sufficiently fresh. So we'll allow for 1 day of something failing before we get alerted.
:+1:
Now that I've merged and installed this PR I think maybe the logwatch for this needs to look in daily.0 instead of in the top level of logs? I've forgotten the task schedule details, but do we not get the master log until everything is moved into daily.0 in the day-rotation of these persistent jobs?
fido: ls -l *log
-rw-rw-r-- 1 aca aspect 0 Aug 12 06:00 dsn_summary.log
-rw-rw-r-- 1 aca aspect 0 Aug 12 06:00 enable_alerts.log
fido: ls -l daily.0/
total 0
-rw-rw-r-- 1 aca aspect 0 Aug 11 06:00 dsn_summary.log
-rw-r--r-- 1 aca aspect 0 Aug 12 06:00 dsn_summary_master.log
-rw-rw-r-- 1 aca aspect 0 Aug 11 06:00 enable_alerts.log
Ah. I was confused. The problem introduced by this PR is not that it is looking for a file that doesn't exist yet, the problem is that this PR changed logtask to "dsn_summary_master.log" but the filename is constructed as
filename='/proj/sot/ska/data/{task}/' \
'{logdir}/daily.0/{logtask}.log'):
So it is now looking for dsn_summary_master.log.log. Fixed in 844df86
Because dsn_summary normally creates no logging output, the usual file time check for
dsn_summary.log
is the day before. It looks likedsn_summary_master.log
is more reflective of actual run time.cc: @jeanconn