opensafely-core / codespaces-initiative

Improving the use of OpenSAFELY in Codespaces
MIT License
0 stars 0 forks source link

Add metrics for Codespaces usage #42

Open lucyb opened 2 months ago

lucyb commented 2 months ago

Based on the discovery in #8 .

We want to know how many Codespaces there are in the OpenSAFELY GitHub organisation.

This ticket is to:

If we can easily record additional information about the Codespaces, like Owner/Repo/State, that might be worth considering too.

The API call needed is something like:

gh api \                          
  -H "Accept: application/vnd.github+json" \  
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /orgs/opensafely/codespaces  

Some slack discussion in this 🧵 thread.

iaindillingham commented 2 months ago

I'd like to read my interpretation of the issue back to you, @lucyb. I think we want to count the number of active Codespaces in the opensafely organization, every hour. I think we want to use this information to answer the question: "Are Codespaces being used within the opensafely organization?" Consequently, we'd be scanning a timeseries to determine whether the count was zero or whether the count was greater than zero. That is, how much greater than zero doesn't matter.

We don't want to know, for example, when each Codespace was created, suspended, and deleted, and hence know how long each Codespace was active. We don't want to be able, for example, to associate each Codespace with a repo and a range of commits.

iaindillingham commented 2 months ago

The token needs the admin:org scope to use the endpoint in the API call. However, there's an example response in the docs.

iaindillingham commented 2 months ago

@lucyb and I had a chat about this issue last week. We agreed that for each Codespace, Metrics should record:

Metrics should record these data on the current daily schedule. We appreciate that doing so will mean that Metrics will miss data for codespaces that are created and deleted within a day.

With these data, we will derive the number of users that are developing their study code in Codespaces, over time. We hope this number is non-zero (someone is using a Codespace 🤞🏻) and is similar to the rate at which new studies are approved, albeit with a lag. For example, if a new study is approved every week for four weeks, then we hope that the number of users that are developing their study code in Codespaces will (eventually) increase to four. Knowing the user and repo will help us in the observation stage of the initiative: We will know who to ask about what, when we want to know about the experience of developing study code in Codespaces.

It would be useful to derive the distribution of time deltas between when a Codespace was created and when it was last used, as the distribution could help us calibrate our usage policy. For example, if the peak of the distribution was consistently low, then we could infer an ephemeral pattern of use. The current maximum retention period of 14 days would appropriate. However, if the peak of the distribution approached 14 days, then we should reevaluate the current maximum retention period, or at least our communication of it, to prevent users from loosing their work.

It would be useful to derive the distribution of time deltas between when a repo was created and when the associated Codespace was last used. We think this distribution will have positive skew -- that is, a large number of small deltas -- as this would demonstrate that new study code is being edited in Codespaces. However, we're very interested in repos to the right of the distribution, as these would demonstrate that old study code is being edited in Codespaces. These studies may be larger, more complex, and depend on older versions of our tools, and may help us address any challenges associated with developing study code in Codespaces sooner rather than later.

Jongmassey commented 1 month ago

Just to note that one of our pilot users is using the template for a repo in their own github account not the opensafely org so will be missing from these stats.

Until the service is fully opened back up we might find more instances of researchers trying to get a head start on projects that are still in the approvals process.

Jongmassey commented 1 month ago