Closed ccerv1 closed 1 month ago
Created an initial list from this query:
WITH
WeeklyData AS (
SELECT
p.project_id,
p.project_name,
e.bucket_week,
SUM(CASE
WHEN e.event_type = 'ISSUE_CLOSED' THEN e.amount
ELSE 0
END
) AS issues_closed,
SUM(CASE
WHEN e.event_type = 'PULL_REQUEST_MERGED' THEN e.amount
ELSE 0
END
) AS prs_merged,
SUM(CASE
WHEN e.event_type = 'COMMIT_CODE' THEN e.amount
ELSE 0
END
) AS commits
FROM
`oso.events_weekly_to_project` AS e
JOIN
`oso.projects_v1` AS p
ON
e.project_id = p.project_id
WHERE
DATE(e.bucket_week) >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 WEEK)
GROUP BY
p.project_id,
p.project_name,
e.bucket_week ),
RankedWeeks AS (
SELECT
*,
NTILE(2) OVER (PARTITION BY project_id ORDER BY bucket_week) AS period
FROM
WeeklyData ),
AggregatedPeriods AS (
SELECT
project_id,
project_name,
SUM(CASE
WHEN period = 1 THEN issues_closed
ELSE 0
END
) AS issues_closed_first_half,
SUM(CASE
WHEN period = 1 THEN prs_merged
ELSE 0
END
) AS prs_merged_first_half,
SUM(CASE
WHEN period = 1 THEN commits
ELSE 0
END
) AS commits_first_half,
SUM(CASE
WHEN period = 2 THEN issues_closed
ELSE 0
END
) AS issues_closed_second_half,
SUM(CASE
WHEN period = 2 THEN prs_merged
ELSE 0
END
) AS prs_merged_second_half,
SUM(CASE
WHEN period = 2 THEN commits
ELSE 0
END
) AS commits_second_half
FROM
RankedWeeks
GROUP BY
project_id,
project_name ),
VelocityRatios AS (
SELECT
project_id,
project_name,
CASE
WHEN issues_closed_first_half = 0 THEN 1
ELSE issues_closed_second_half / issues_closed_first_half
END
AS velocity_ratio_issues,
CASE
WHEN prs_merged_first_half = 0 THEN 1
ELSE prs_merged_second_half / prs_merged_first_half
END
AS velocity_ratio_prs,
CASE
WHEN commits_first_half = 0 THEN 1
ELSE commits_second_half / commits_first_half
END
AS velocity_ratio_commits,
issues_closed_second_half,
prs_merged_second_half,
commits_second_half
FROM
AggregatedPeriods ),
VelocityStats AS (
SELECT
project_id,
project_name,
issues_closed_second_half,
prs_merged_second_half,
commits_second_half,
velocity_ratio_issues,
velocity_ratio_prs,
velocity_ratio_commits,
(velocity_ratio_issues + velocity_ratio_prs + velocity_ratio_commits) / 3 AS velocity_ratio
FROM
VelocityRatios
ORDER BY
velocity_ratio DESC )
SELECT
p.project_id,
p.project_name,
p.first_commit_date,
p.last_commit_date,
p.fork_count,
p.star_count,
p.contributor_count_6_months,
p.fulltime_developer_average_6_months,
v.velocity_ratio,
ARRAY_AGG(pbc.collection_name) AS collections
FROM
`oso.code_metrics_by_project_v1` AS p
JOIN
VelocityStats AS v
ON
p.project_id = v.project_id
INNER JOIN
`oso.projects_by_collection_v1` pbc
ON
p.project_id = pbc.project_id
WHERE
last_commit_date >= '2024-08-01'
AND contributor_count_6_months BETWEEN 2
AND 200
AND star_count BETWEEN 5
AND 5000
GROUP BY
1,
2,
3,
4,
5,
6,
7,
8,
9
@rohitmalekar let me know if you want to help with this! (See above for context, my initial query and a long list of projects)
@rohitmalekar let me know if you want to help with this! (See above for context, my initial query and a long list of projects)
This is great, @ccerv1. There are a couple of different directions from here, and I could use your suggestion or if you have any specific exploration in mind. I can overlap onchain data on top of the velocity to offer another dimension for review. Another lens to apply is to see if any "alumni" from prior GG rounds can potentially return.
PS: I have a question about forcing the velocity ratio to 1 when the first-period metric is zero. This could make a few projects look good when they have stagnated (when the second-period metric is zero). Or, it could mask the growth of a project with healthier second-half metrics after an uneventful first half.
How about we force the ratio to 0 when there is no activity in either period? If there is activity in the second period with the first period as zero, we could get a non-null result for comparative reasons by setting the denominator to 1.
Thanks for the feedback @rohitmalekar ! I updated the query and sent the sheet over to Matilda.
Here's the latest
WITH
WeeklyData AS (
SELECT
p.project_id,
p.project_name,
e.bucket_week,
SUM(CASE
WHEN e.event_type = 'ISSUE_CLOSED' THEN e.amount
ELSE 0
END
) AS issues_closed,
SUM(CASE
WHEN e.event_type = 'PULL_REQUEST_MERGED' THEN e.amount
ELSE 0
END
) AS prs_merged,
SUM(CASE
WHEN e.event_type = 'COMMIT_CODE' THEN e.amount
ELSE 0
END
) AS commits
FROM
`oso.events_weekly_to_project` AS e
JOIN
`oso.projects_v1` AS p
ON
e.project_id = p.project_id
WHERE
DATE(e.bucket_week) >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 WEEK)
GROUP BY
p.project_id,
p.project_name,
e.bucket_week ),
RankedWeeks AS (
SELECT
*,
NTILE(2) OVER (PARTITION BY project_id ORDER BY bucket_week) AS period
FROM
WeeklyData ),
AggregatedPeriods AS (
SELECT
project_id,
project_name,
SUM(CASE
WHEN period = 1 THEN issues_closed
ELSE 0
END
) AS issues_closed_first_half,
SUM(CASE
WHEN period = 1 THEN prs_merged
ELSE 0
END
) AS prs_merged_first_half,
SUM(CASE
WHEN period = 1 THEN commits
ELSE 0
END
) AS commits_first_half,
SUM(CASE
WHEN period = 2 THEN issues_closed
ELSE 0
END
) AS issues_closed_second_half,
SUM(CASE
WHEN period = 2 THEN prs_merged
ELSE 0
END
) AS prs_merged_second_half,
SUM(CASE
WHEN period = 2 THEN commits
ELSE 0
END
) AS commits_second_half
FROM
RankedWeeks
GROUP BY
project_id,
project_name ),
VelocityRatios AS (
SELECT
project_id,
project_name,
CASE
WHEN issues_closed_first_half = 0 THEN 1
ELSE issues_closed_second_half / issues_closed_first_half
END
AS velocity_ratio_issues,
CASE
WHEN prs_merged_first_half = 0 THEN 1
ELSE prs_merged_second_half / prs_merged_first_half
END
AS velocity_ratio_prs,
CASE
WHEN commits_first_half = 0 THEN 1
ELSE commits_second_half / commits_first_half
END
AS velocity_ratio_commits,
issues_closed_second_half,
prs_merged_second_half,
commits_second_half
FROM
AggregatedPeriods ),
VelocityStats AS (
SELECT
project_id,
project_name,
issues_closed_second_half,
prs_merged_second_half,
commits_second_half,
velocity_ratio_issues,
velocity_ratio_prs,
velocity_ratio_commits,
(velocity_ratio_issues + velocity_ratio_prs + velocity_ratio_commits) / 3 AS velocity_ratio
FROM
VelocityRatios
WHERE
commits_second_half > 0
ORDER BY
velocity_ratio DESC ),
OnchainMetrics AS (
SELECT
project_id,
SUM(active_contract_count_90_days) AS num_active_contracts_3M,
SUM(transaction_count_6_months) AS num_onchain_txns_6M
FROM
`oso.onchain_metrics_by_project_v1`
GROUP BY
1 ),
AllProjects AS (
SELECT
p.project_name,
p.first_commit_date,
p.last_commit_date,
p.fork_count,
p.star_count,
p.contributor_count_6_months,
p.fulltime_developer_average_6_months,
v.velocity_ratio,
COALESCE(om.num_active_contracts_3M, 0) AS num_active_contracts_3M,
COALESCE(om.num_onchain_txns_6M, 0) AS num_onchain_txns_6M,
ARRAY_AGG(pbc.collection_name) AS collections
FROM
`oso.code_metrics_by_project_v1` AS p
JOIN
VelocityStats AS v
ON
p.project_id = v.project_id
LEFT OUTER JOIN
OnchainMetrics AS om
ON
p.project_id = om.project_id
INNER JOIN
`oso.projects_by_collection_v1` pbc
ON
p.project_id = pbc.project_id
GROUP BY
1,
2,
3,
4,
5,
6,
7,
8,
9,
10 )
SELECT
ossd.display_name,
(
SELECT
url
FROM
UNNEST(ossd.websites)
LIMIT
1) AS project_url,
(
SELECT
url
FROM
UNNEST(ossd.social.twitter)
LIMIT
1) AS twitter_url,
(
SELECT
url
FROM
UNNEST(ossd.github)
LIMIT
1) AS github_url,
AllProjects.*,
FROM
AllProjects
JOIN
`ossd.projects` AS ossd
ON
AllProjects.project_name = ossd.name
WHERE
last_commit_date >= '2024-08-01'
AND contributor_count_6_months BETWEEN 2
AND 200
AND star_count BETWEEN 5
AND 5000
AND num_onchain_txns_6M < 1000000
ORDER BY
1
What is it?
As we did in GG20, provide a list of active OSS projects to help Gitcoin with outreach to builders