Proposed Best Practice: Metrics

mdelaurentis commented 7 years ago

As a best practice, a Tap or Target should emit structured logging to a file, which can be used for monitoring or other analysis.

Both a Tap and a Target should accept an optional "metrics_path" key in its configuration file. If that key is present, the Tap or Target should write out a JSON map every time it makes an external request, for example to an API or database. If a Tap or Target that supports metrics is run without a "metrics_path" key, it should not fail, it should just not write metrics.

It is expected that whatever program runs the Tap or Target will provide the name of a file to write the metrics to, and that some program will consume the metrics from that location and store them somewhere. We could use either a named pipe or a regular file. If a named pipe is used, the wrapper program must ensure that something reads from the pipe continuously so as not to block the Tap. If a regular file is used, the wrapper program must ensure that it does not cause the disk to fill up.

The metric structure is designed to specifically capture information relevant to the types of external calls that a Tap or Target would typically make. All fields are optional. If some of these fields are hard for a particular Tap to track, we would rather have it report something than nothing.

success - Whether the call succeeded
duration - How long the request or call took, in milliseconds
num_bytes - Size of request or call payload in bytes
num_records - Size of request or call payload in terms of number of records
tags - (optional) Mapping from string keys to string values providing additional description of the request

Examples

Successful API call from a Tap:

{"success": true,
 "duration": 432,
 "num_bytes": 12345
 "num_records": 1234
 "tags": {
     "endpoint": "/users/{}/orders",
     "http_method": "GET",
     "http_response_code": 200
 }
}

Successful SELECT statement from a Tap:

{"success": true,
 "duration": 432,
 "num_bytes": 12345
 "num_records": 1234
 "tags": {
     "table": "users",
      "sql_command": "SELECT",
 }
}

Questions

Would it be simpler to get rid of the "tags" key and instead say that the metrics can include any arbitrary properties, which will be treated as tags?
This metrics format is specific to timing and counting requests to external systems. It also bundles several values (count, duration, success, bytes, records) into one data structure. Would it be better to emit a separate metric object for each of those values?

karstendick commented 7 years ago

I'd rename duration to duration_ms to be explicit.

For SaaS integrations, it may be helpful to specify both an endpoint tag and a table tag, as some endpoints provide data for multiple tables.

Some endpoints provide no data, only metadata. In that case, would num_records be 0? And would num_bytes be the size of the http response or still 0?

I'd ditch the tags key and just make the interface be a bag of properties to keep things simple and flexible. I'd rather keep each metric record together rather than emitting a separate object for each key-value pair. It's meaningful that all the values refer to the same event (e.g. one http query or one SQL query), and it's worthwhile to keep that in one data point.

mdelaurentis commented 7 years ago

Thanks for the feedback @karstendick . I agree with most of your suggestion.

rename duration to duration_ms

yes

For SaaS integrations, it may be helpful to specify both an endpoint tag and a table tag, as some endpoints provide data for multiple tables.

That's a good point. If we ditch the tags key, then I suppose we would have both "endpoint" and "table" keys, and Taps could decide whether to use one, both, or neither. Alternatively, should we use a more generic name for the source of the data? Maybe just "source". That could be a URL or a table name.

Some endpoints provide no data, only metadata. In that case, would num_records be 0? And would num_bytes be the size of the http response or still 0?

I suppose either 0 or just leave it null. That way we can count that we made a request, but not suggest that that request would impact the record count.

I'd ditch the tags key and just make the interface be a bag of properties to keep things simple and flexible. I'd rather keep each metric record together rather than emitting a separate object for each key-value pair. It's meaningful that all the values refer to the same event (e.g. one http query or one SQL query), and it's worthwhile to keep that in one data point.

Yeah, I agree. Thanks.

singer-io / getting-started

Proposed Best Practice: Metrics #16

Examples

Questions