sot / jobwatch

Watch files, database tables, and log files to ensure valid cron processing
3 stars 0 forks source link

Watch dsn_summary_master.log because dsn_summary.log does not reflect run time #14

Closed taldcroft closed 8 years ago

taldcroft commented 8 years ago

Because dsn_summary normally creates no logging output, the usual file time check for dsn_summary.log is the day before. It looks like dsn_summary_master.log is more reflective of actual run time.

kadi$ lst daily.0/
total 0
-rw-rw-r-- 1 aca aspect 0 May 15 06:00 dsn_summary.log
-rw-rw-r-- 1 aca aspect 0 May 15 06:00 enable_alerts.log
-rw-r--r-- 1 aca aspect 0 May 16 06:00 dsn_summary_master.log

cc: @jeanconn

jeanconn commented 8 years ago

Good catch. This makes sense and is a clean fix. Was the interval set to 2 days with unconscious awareness of this? As in is the 2 day limit still good or should it be 1 with the fix?

taldcroft commented 8 years ago

A 2-day limit is still fine from the operational perspective of the DSN file being sufficiently fresh. So we'll allow for 1 day of something failing before we get alerted.

jeanconn commented 8 years ago

:+1:

jeanconn commented 8 years ago

Now that I've merged and installed this PR I think maybe the logwatch for this needs to look in daily.0 instead of in the top level of logs? I've forgotten the task schedule details, but do we not get the master log until everything is moved into daily.0 in the day-rotation of these persistent jobs?

fido: ls -l *log
-rw-rw-r-- 1 aca aspect 0 Aug 12 06:00 dsn_summary.log
-rw-rw-r-- 1 aca aspect 0 Aug 12 06:00 enable_alerts.log

fido: ls -l daily.0/
total 0
-rw-rw-r-- 1 aca aspect 0 Aug 11 06:00 dsn_summary.log
-rw-r--r-- 1 aca aspect 0 Aug 12 06:00 dsn_summary_master.log
-rw-rw-r-- 1 aca aspect 0 Aug 11 06:00 enable_alerts.log
jeanconn commented 8 years ago

Ah. I was confused. The problem introduced by this PR is not that it is looking for a file that doesn't exist yet, the problem is that this PR changed logtask to "dsn_summary_master.log" but the filename is constructed as

                 filename='/proj/sot/ska/data/{task}/' \
                         '{logdir}/daily.0/{logtask}.log'):

So it is now looking for dsn_summary_master.log.log. Fixed in 844df86