Who, what, where, when
[TODO: request comments regarding replacement of AuditStore with enhanced logging engine]
Auditing will be done with a dedicated API but utilize the logging engine for storage.
Logging engine enhancements will be made to facilitate audit goals, but also improve logging in general. Logging will become an asynchronous operation, to the extent possible, by queuing the data to be logged quickly, then formatting and storing using background goroutines owned by the logging engine.
Logging asynchronously presents some caveats. For example, when logging a struct asynchronously care must be taken to ensure the contents of the data being logged does not change while the log record is being created and formatted. To achieve this one of two methods will be employed (TBD after benchmarking):
1. types of arguments will be inspected and copies made for mutable types before enqueuing
Another issue with asynchronous logging is deciding what to do when the queue is full. To keep things simple the caller will be blocked up to a configurable timeout waiting for enqueue to succeed. If enqueue succeeds before the timeout then a counter representing logging contention will be incremented and the total emitted periodically. If enqueue times out then an alert is emitted.
[TODO: determine queue approach: buffered channel, ring buffer, ...]
Queuing will be done via a buffered channel.
(phase 2) Queue(s) will be monitored for percent full and metrics provided via Prometheus.
Asynchronous logging also requires explicit call to logging shutdown method to ensure queues are flushed. All application exit paths must be covered.
[TODO: determine if audit API needs to extend deeper into app pkg or 100% of auditable events will come through server REST API]
[TODO: research which auditable events currently bypass server REST API]
Auditing at the Rest API and CLI layers will provide coverage.
(phase 2) The logging engine will be enhanced to allow discrete logging "levels". Currently an application wide logging level is configured and any log records matching that level or lower will be emitted. These logging levels will remain, but support for zero or more discrete logging "level"'s will be added, meaning only records matching the current log level or one of the discrete "level"s are emitted. Within the logging engine, any level below 10 (trace through critical/fatal, plus reserved) will behave as it does currently, but any "level" above 10 will be considered discrete. Audit records will have a level above 10.
(phase 2) Logging engine enhancements will be implemented via the mlog package and will be compatible with existing usage.
(phase 2) Logging engine will allow logging levels and discrete levels to different targets (files, databases, etc) via configuration. Zap logger will continue to be used internally.
A single audit record is emitted for each event (add, delete, login, ...). Multiple auditable events may be emitted for a single API call.
name | type | description |
---|---|---|
ID | string | audit record id |
CreateAt | int64 | timestamp of record creation, UTC |
Level | string | e.g. audit-rest, audit-app, audit-model |
APIPath | string | rest endpoint |
Event | string | e.g. add, delete, login, ... |
Status | string | e.g. attempt, success, fail, ... |
UserId | string | id of user calling the API |
SessionId | string | id of session used to call the API |
Client | string | e.g. webapp, mmctl, user-agent |
IPAddress | string | ip address of Client |
Meta | map[string]interface{} | API specific info; e.g. user id being deleted |
type AuditRecord struct {
ID string
//CreateAt int64 -- added by logger
Level string
APIPath string
Event string
Status string
UserId string
SessionId string
Client string
IPAddress string
Meta map[string]interface{}
}
From web.Context, audit API will be replaced with a defer
based API where a partially populated audit record is generated and additional information is added by the Rest API. All calls will need to be reviewed to ensure required fields are added. Conversion of name value strings to Meta map will happen asynchronously.
Context will continue to be used to auto-populate some of the audit record fields.
[TODO: request comments regarding security/reliability of data within Context]
Example:
func sampleRestAPI(c *Context, w http.ResponseWriter, r *http.Request) {
// check inputs ...
// auditRec is pre-populated with data from Context
// Fail by default
auditRec := c.MakeAuditRecord("sampleAPI", audit.Fail)
// Log the audit record regardless of how method is exited,
// including panic
defer c.LogAuditRec(auditRec)
// ... do whatever this rest API is supposed to do
created_id, err := c.App.CreateSomething(...)
if err != nil {
c.Err = err
return
}
// add any additional fields specific to this rest API
auditRec.AddMeta("created_id", created_id)
// mark audit record as successful
auditRec.Success()
}
New APIs will be added in the app layer to capture all auditable events. This is necessary because not all auditable events are triggered via the Rest API. This means certain code paths will emit multiple audit records for the same event. To reduce noise, the
APILayer
field will be used to filter records such that only the record emitted closest to the caller is kept. For example, a caller using the Rest API will have any app layer records filtered leaving only the Rest layer record.[TODO: is filtering really necessary?]
[TODO: api examples, code snippets]
After further investigation, the app layer is not an ideal place for generating audit records. Too many fields needed for useful audit records are missing at this layer and would either need to be passed in or looked up. Instead auditing will be done at the Rest API layer and CLI layer where all the needed calling context exists.
(phase 2) Storage options will be administrator configurable via logging engine configuration. When storing to file, typically the audit records will go to a separate file from general logging.
In the case of error writing to the target storage several strategies can be employed. The strategy(s) chosen are specific to the target type (file, database, email, ...).
(phase 2) To ensure audit logs cannot be unknowingly tampered with or corrupted it will be possible to configure the logging engine to sign log files for specific targets. When an audit store cannot be made secure, audit logs should be stored in multiple places (e.g. file and database) so they can reconciled if needed.
(phase 2) Alerting will be achieved via a plugin logger target, and configured using a discrete log level. Destination(s) for alerts can be email, database, Mattermost channel post, or other.
[TODO]
Phase 2
[TODO]