Closed caseyjlaw closed 1 month ago
It's not clear to me how this relates to the data recorders.
After thinking about it, I think it doesn't. But do you think the data recorders could provide input to such a service? That is, are there monitor points or clients that we can use to assess whether bad data is being written? I think not, since it either writes or it doesn't. Is that fair?
Sure, there are the statistics/*
and diagnostics/*
monitoring points that are currently being written that might be useful. Maybe the recorder provided values make the most sense for the beamformer outputs where you cannot go back and redo the beamforming if there are problems.
Further discussion on a different issue: https://github.com/ovro-lwa/lwa-issues/issues/106.
Feature request
Intro
We need to include a priori information from subsystems (esp. f-engine) that can identify bad data. Currently, we rely on post processing to identify bad data (e.g., calibration solution fails for a bad ant-pol). This is unreliable and biases the analysis of good data.
Feature
Set up a daemon that polls all snaps with the f-engine python client. Use the equivalent of
print_status_all(ignore_ok=True)
to get a summary of bad inputs. Map these bad inputs to ant-pol and save the time and state of all inputs. The saved information should be parsable into a flag table in CASA MS. The real-time flagger (in development) could read from etcd to get an instantaneous set of bad inputs to be flagged.One possibility is to create a new etcd key that gets ingested to influx. E.g.,
/mon/health
may hold a dict with keys "LWA-nnna" and boolean value, where nnna is ant-pol. The time history of each key should be available with an influx query (i.e., a python client or in grafana).Other use cases should be considered. Please add ideas to this issue!
Example of on-SNAP logic