opensafely-core / research-action

A GitHub action for verifying that OpenSAFELY research repos can run correctly
0 stars 1 forks source link

Log stats about test runs somewhere #42

Closed evansd closed 2 years ago

evansd commented 2 years ago

This is still a bit vague and handwavy but noting it down following discussion with Seb around identifying and debugging projects which run into, or are likely to run into, performance issues.

The idea is that this action can log stats about project test runs to some central location where we can do something with them.

The sort of things that could be logged are:

Possibly we can push these straight into Honeycomb and avoid having to build any backend infrastructure for this. If we can't do this directly, maybe we can have an endpoint on job-server which acts as a proxy so that we still don't need to worry about storing or analysing the data.

sebbacon commented 2 years ago

Quick clarification: just to be clear, when Dave says "this action" above, he's referring to Github Actions. The thought was that although we'd ideally be running this in production, too, it would be very instructive to push stats generated in CI to honeycomb in the first instance. The same process could add information about if the tests finished with success or failure, and help us detect patterns in users that are not using CI as intended (for example)

Thinking aloud, the first thing would be a wrapper script that parses log files for interesting info (see https://github.com/opensafely-core/cohort-extractor/issues/777), and turns those into structured data, which includes the job id, info from the manifest, etc.

This could then be run by hand in production, to get data about the most expensive (in terms of time, at least) variables, for example.

The same script would be run as part of our CI run, and its output posted somewhere else.

Honeycomb seems like a good starting point, but we need to find out how to use its API and obtain the correct security tokens (@madwort will be able to help here)

rebkwok commented 2 years ago

Honeycomb is currently not receiving any logs from test runs; this is because the structure of the logs that cohort-extract outputs has changed, and opensafely-cli's script that extracts them needs updated.

Note that the script in the stats-logs-notebooks with the same name extracts logs with the latest structure - it's designed to work with the collated logs on the server rather than the logs generated by a job, but otherwise does the same log parsing.

madwort commented 2 years ago

I think something was implemented, which has subsequently failed. IMHO we can close this ticket in favour of #93 and/or fresh tickets about future work/goals.