sot / jobwatch

Watch files, database tables, and log files to ensure valid cron processing
3 stars 0 forks source link

Parse chandra.snapshot to confirm it is current #20

Open jeanconn opened 6 years ago

taldcroft commented 6 years ago

The trick is that you also need to be aware of not only the comm schedule but comm status. So snapshot should be current within a few minutes IFF we are actually receiving realtime telemetry. Not infrequently a planned comm has a problem and in that case you don't want to send an alert that MTA is not alive.

For an hourly job watcher, could probably use a combination of kadi (to check DSN plans) and MAUDE (to see if realtime telemetry arrived during that time). If this passes, then check the most recent snapshot date. E.g.:

This is basically a backstop strategy that prevents snapshot being down for more than one comm. Organizationally we really want something that is running every minute to ensure that an alert goes out within the affected comm.

jeanconn commented 6 years ago

Which kadi events are right for this? dsn_comms or pass_plans? Not much experience with these data. I assume the dsn_comms also need to be filtered. Also which times make the most sense? For the dsn_comms event eot makes sense to me for these data, but it isn't really a time.

taldcroft commented 6 years ago

Use the DSN comm passes event and don't blame me for the difficult format... :smile: E.g.: http://kadi.cfa.harvard.edu/kadi/events/dsn_comm/581635/?filter=&sort=-start&index=5

You need to use bot and eot to get track times, and start to get the day and year. So basically just split bot and eot into tm = f'{bot[:2]}:{bot[2:]}:00' (where these come from bot or eot) and make a new date string with start[:9] + tm. Then if that date is before start you have to add one day.