opensafely-core / codespaces-initiative

Improving the use of OpenSAFELY in Codespaces
MIT License
0 stars 0 forks source link

Understand how we might count people using Codespaces #8

Closed StevenMaude closed 5 months ago

StevenMaude commented 7 months ago

We might want to count how long people are using Codespaces for. When/if people migrate to local development. Can we associate a commit with a dev environment?

Ideas

lucyb commented 6 months ago

For the purposes of testing what's available via the GitHub API, the following command may be useful: gh codespace list --org opensafely.

I believe this is equivalent to:

gh api \                          
  -H "Accept: application/vnd.github+json" \  
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /orgs/opensafely/codespaces  
Jongmassey commented 6 months ago

GitHub adds some environment variables when running in Codespaces; these might be useful.

These could be useful if we add telemetry to opensafely-cli - there's bits in there in the vendored job runner but AFAICT it's not being used currently. This could also provide us with a "local run" denominator but raises possible issues around user consent to this telemetry and gracefully handling when a user is not connected to the internet.

For the purposes of testing what's available via the GitHub API, the following command may be useful: gh codespace list --org opensafely.

This appears to list the currently running codespaces, from the docs I can't see anything about historic codespace usage. We could poll this on a semi-regular basis to track it over time.

Should we just do a Slack poll instead?

This is my instinctive preference, but might suffer from low uptake

iaindillingham commented 6 months ago

If we wish to count how many Codespaces were started, then we could use one of the lifecycle scripts to communicate with an endpoint. If we sent both start and stop timestamps, then we could determine each Codespace's duration.

The issue title mentions counting people. We could pass values of one or more default environment variables to an endpoint, too. Doing so doesn't seem dramatically different to capturing other telemetry data.

Jongmassey commented 6 months ago

I've been poking at this for a couple of days and here are some of the harder edges I've come up against:

lucyb commented 5 months ago

For reference, there's a slack thread with some additional discussion.

I'm going to make a decision here and say that we poll the GitHub API every hour. This might not give us particularly detailed information, but it might tell us what additional information we do need and will be quick to do. Also, it will tell us if people are using Codespaces at all... if they're not, then there's no point in spending extra time gathering additional metrics.