opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
73 stars 16 forks source link

feat(dbt): add remaining PLN github-based metrics #2484

Closed ccerv1 closed 15 hours ago

ccerv1 commented 1 day ago

This PR implements the following metrics as dbt and SQL mesh models:

It also includes a new event model for threading comments on issues / pull requests / etc.

vercel[bot] commented 1 day ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
kariba-network 🛑 Canceled (Inspect) Nov 21, 2024 3:02pm
oso-www ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 21, 2024 3:02pm
oso-prs[bot] commented 1 day ago

Test deployment for PR #2484 successfully deployed to oso-pull-requests.pr_2484.

ravenac95 commented 1 day ago

Just had a chat with @ryscheng , the changes here are making sense to me (and your changes you tried to make in sqlmesh that you linked me to on discord). Essentially, the issue with the "time to close" issue is that you need to include an additional dimensional component for each event that we can join against. My head was stuck in thinking we'd need to add that to the events table but like @ryscheng suggested, an auxiliary table makes a lot of sense and ensures strong typing here.

With that, I see a couple things here that will help us get this implemented across sqlmesh + dbt.

ccerv1 commented 15 hours ago

Just had a chat with @ryscheng , the changes here are making sense to me (and your changes you tried to make in sqlmesh that you linked me to on discord). Essentially, the issue with the "time to close" issue is that you need to include an additional dimensional component for each event that we can join against. My head was stuck in thinking we'd need to add that to the events table but like @ryscheng suggested, an auxiliary table makes a lot of sense and ensures strong typing here.

With that, I see a couple things here that will help us get this implemented across sqlmesh + dbt.

  • SQLMesh should be used for things downstream of events

    • Right now in warehouse/dbt/models/intermediate/analyses/int_github_pr_issue_threads.sql we are circumventing the events table here. I think we should instead make this a peer to int_events and have this be the auxiliary table above. We might need to add some additional events to the event table as well, but you can think of the auxiliary table as a way to add a strongly typed json field to each event for each specific event type. Then we just expose this auxiliary event table as a mart and I can get it in sqlmesh land no problem.
  • The events related to artifact table might not be necessary once we get the above bullet

    • If we have the auxiliary table we can join against it inside the metrics models. It should then be possible to take things like issue comments and compare that to the issue's open timestamp to get the "time to first response" metric.

As discussed just now, I will remove this from the PR and we can implement this separately. See also #1992