opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
73 stars 16 forks source link

Updated list of active OSS projects for GG22 #2161

Closed ccerv1 closed 1 month ago

ccerv1 commented 2 months ago

What is it?

As we did in GG20, provide a list of active OSS projects to help Gitcoin with outreach to builders

ccerv1 commented 1 month ago

Created an initial list from this query:

WITH
  WeeklyData AS (
  SELECT
    p.project_id,
    p.project_name,
    e.bucket_week,
    SUM(CASE
        WHEN e.event_type = 'ISSUE_CLOSED' THEN e.amount
        ELSE 0
    END
      ) AS issues_closed,
    SUM(CASE
        WHEN e.event_type = 'PULL_REQUEST_MERGED' THEN e.amount
        ELSE 0
    END
      ) AS prs_merged,
    SUM(CASE
        WHEN e.event_type = 'COMMIT_CODE' THEN e.amount
        ELSE 0
    END
      ) AS commits
  FROM
    `oso.events_weekly_to_project` AS e
  JOIN
    `oso.projects_v1` AS p
  ON
    e.project_id = p.project_id
  WHERE
    DATE(e.bucket_week) >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 WEEK)
  GROUP BY
    p.project_id,
    p.project_name,
    e.bucket_week ),
  RankedWeeks AS (
  SELECT
    *,
    NTILE(2) OVER (PARTITION BY project_id ORDER BY bucket_week) AS period
  FROM
    WeeklyData ),
  AggregatedPeriods AS (
  SELECT
    project_id,
    project_name,
    SUM(CASE
        WHEN period = 1 THEN issues_closed
        ELSE 0
    END
      ) AS issues_closed_first_half,
    SUM(CASE
        WHEN period = 1 THEN prs_merged
        ELSE 0
    END
      ) AS prs_merged_first_half,
    SUM(CASE
        WHEN period = 1 THEN commits
        ELSE 0
    END
      ) AS commits_first_half,
    SUM(CASE
        WHEN period = 2 THEN issues_closed
        ELSE 0
    END
      ) AS issues_closed_second_half,
    SUM(CASE
        WHEN period = 2 THEN prs_merged
        ELSE 0
    END
      ) AS prs_merged_second_half,
    SUM(CASE
        WHEN period = 2 THEN commits
        ELSE 0
    END
      ) AS commits_second_half
  FROM
    RankedWeeks
  GROUP BY
    project_id,
    project_name ),
  VelocityRatios AS (
  SELECT
    project_id,
    project_name,
    CASE
      WHEN issues_closed_first_half = 0 THEN 1
      ELSE issues_closed_second_half / issues_closed_first_half
  END
    AS velocity_ratio_issues,
    CASE
      WHEN prs_merged_first_half = 0 THEN 1
      ELSE prs_merged_second_half / prs_merged_first_half
  END
    AS velocity_ratio_prs,
    CASE
      WHEN commits_first_half = 0 THEN 1
      ELSE commits_second_half / commits_first_half
  END
    AS velocity_ratio_commits,
    issues_closed_second_half,
    prs_merged_second_half,
    commits_second_half
  FROM
    AggregatedPeriods ),
  VelocityStats AS (
  SELECT
    project_id,
    project_name,
    issues_closed_second_half,
    prs_merged_second_half,
    commits_second_half,
    velocity_ratio_issues,
    velocity_ratio_prs,
    velocity_ratio_commits,
    (velocity_ratio_issues + velocity_ratio_prs + velocity_ratio_commits) / 3 AS velocity_ratio
  FROM
    VelocityRatios
  ORDER BY
    velocity_ratio DESC )
SELECT
  p.project_id,
  p.project_name,
  p.first_commit_date,
  p.last_commit_date,
  p.fork_count,
  p.star_count,
  p.contributor_count_6_months,
  p.fulltime_developer_average_6_months,
  v.velocity_ratio,
  ARRAY_AGG(pbc.collection_name) AS collections
FROM
  `oso.code_metrics_by_project_v1` AS p
JOIN
  VelocityStats AS v
ON
  p.project_id = v.project_id
INNER JOIN
  `oso.projects_by_collection_v1` pbc
ON
  p.project_id = pbc.project_id
WHERE
  last_commit_date >= '2024-08-01'
  AND contributor_count_6_months BETWEEN 2
  AND 200
  AND star_count BETWEEN 5
  AND 5000
GROUP BY
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8,
  9
ccerv1 commented 1 month ago

Here's an export: https://docs.google.com/spreadsheets/d/1yTS1G1GZa6Lpd33Q2Q_ez8X1BgP-ZSEZmH_S_GgfVlM/edit?usp=sharing

ccerv1 commented 1 month ago

@rohitmalekar let me know if you want to help with this! (See above for context, my initial query and a long list of projects)

rohitmalekar commented 1 month ago

@rohitmalekar let me know if you want to help with this! (See above for context, my initial query and a long list of projects)

This is great, @ccerv1. There are a couple of different directions from here, and I could use your suggestion or if you have any specific exploration in mind. I can overlap onchain data on top of the velocity to offer another dimension for review. Another lens to apply is to see if any "alumni" from prior GG rounds can potentially return.

PS: I have a question about forcing the velocity ratio to 1 when the first-period metric is zero. This could make a few projects look good when they have stagnated (when the second-period metric is zero). Or, it could mask the growth of a project with healthier second-half metrics after an uneventful first half.

How about we force the ratio to 0 when there is no activity in either period? If there is activity in the second period with the first period as zero, we could get a non-null result for comparative reasons by setting the denominator to 1.

ccerv1 commented 1 month ago

Thanks for the feedback @rohitmalekar ! I updated the query and sent the sheet over to Matilda.

Here's the latest

WITH
  WeeklyData AS (
  SELECT
    p.project_id,
    p.project_name,
    e.bucket_week,
    SUM(CASE
        WHEN e.event_type = 'ISSUE_CLOSED' THEN e.amount
        ELSE 0
    END
      ) AS issues_closed,
    SUM(CASE
        WHEN e.event_type = 'PULL_REQUEST_MERGED' THEN e.amount
        ELSE 0
    END
      ) AS prs_merged,
    SUM(CASE
        WHEN e.event_type = 'COMMIT_CODE' THEN e.amount
        ELSE 0
    END
      ) AS commits
  FROM
    `oso.events_weekly_to_project` AS e
  JOIN
    `oso.projects_v1` AS p
  ON
    e.project_id = p.project_id
  WHERE
    DATE(e.bucket_week) >= DATE_SUB(CURRENT_DATE(), INTERVAL 12 WEEK)
  GROUP BY
    p.project_id,
    p.project_name,
    e.bucket_week ),
  RankedWeeks AS (
  SELECT
    *,
    NTILE(2) OVER (PARTITION BY project_id ORDER BY bucket_week) AS period
  FROM
    WeeklyData ),
  AggregatedPeriods AS (
  SELECT
    project_id,
    project_name,
    SUM(CASE
        WHEN period = 1 THEN issues_closed
        ELSE 0
    END
      ) AS issues_closed_first_half,
    SUM(CASE
        WHEN period = 1 THEN prs_merged
        ELSE 0
    END
      ) AS prs_merged_first_half,
    SUM(CASE
        WHEN period = 1 THEN commits
        ELSE 0
    END
      ) AS commits_first_half,
    SUM(CASE
        WHEN period = 2 THEN issues_closed
        ELSE 0
    END
      ) AS issues_closed_second_half,
    SUM(CASE
        WHEN period = 2 THEN prs_merged
        ELSE 0
    END
      ) AS prs_merged_second_half,
    SUM(CASE
        WHEN period = 2 THEN commits
        ELSE 0
    END
      ) AS commits_second_half
  FROM
    RankedWeeks
  GROUP BY
    project_id,
    project_name ),
  VelocityRatios AS (
  SELECT
    project_id,
    project_name,
    CASE
      WHEN issues_closed_first_half = 0 THEN 1
      ELSE issues_closed_second_half / issues_closed_first_half
  END
    AS velocity_ratio_issues,
    CASE
      WHEN prs_merged_first_half = 0 THEN 1
      ELSE prs_merged_second_half / prs_merged_first_half
  END
    AS velocity_ratio_prs,
    CASE
      WHEN commits_first_half = 0 THEN 1
      ELSE commits_second_half / commits_first_half
  END
    AS velocity_ratio_commits,
    issues_closed_second_half,
    prs_merged_second_half,
    commits_second_half
  FROM
    AggregatedPeriods ),
  VelocityStats AS (
  SELECT
    project_id,
    project_name,
    issues_closed_second_half,
    prs_merged_second_half,
    commits_second_half,
    velocity_ratio_issues,
    velocity_ratio_prs,
    velocity_ratio_commits,
    (velocity_ratio_issues + velocity_ratio_prs + velocity_ratio_commits) / 3 AS velocity_ratio
  FROM
    VelocityRatios
  WHERE
    commits_second_half > 0
  ORDER BY
    velocity_ratio DESC ),
  OnchainMetrics AS (
  SELECT
    project_id,
    SUM(active_contract_count_90_days) AS num_active_contracts_3M,
    SUM(transaction_count_6_months) AS num_onchain_txns_6M
  FROM
    `oso.onchain_metrics_by_project_v1`
  GROUP BY
    1 ),
  AllProjects AS (
  SELECT
    p.project_name,
    p.first_commit_date,
    p.last_commit_date,
    p.fork_count,
    p.star_count,
    p.contributor_count_6_months,
    p.fulltime_developer_average_6_months,
    v.velocity_ratio,
    COALESCE(om.num_active_contracts_3M, 0) AS num_active_contracts_3M,
    COALESCE(om.num_onchain_txns_6M, 0) AS num_onchain_txns_6M,
    ARRAY_AGG(pbc.collection_name) AS collections
  FROM
    `oso.code_metrics_by_project_v1` AS p
  JOIN
    VelocityStats AS v
  ON
    p.project_id = v.project_id
  LEFT OUTER JOIN
    OnchainMetrics AS om
  ON
    p.project_id = om.project_id
  INNER JOIN
    `oso.projects_by_collection_v1` pbc
  ON
    p.project_id = pbc.project_id
  GROUP BY
    1,
    2,
    3,
    4,
    5,
    6,
    7,
    8,
    9,
    10 )
SELECT
  ossd.display_name,
  (
  SELECT
    url
  FROM
    UNNEST(ossd.websites)
  LIMIT
    1) AS project_url,
  (
  SELECT
    url
  FROM
    UNNEST(ossd.social.twitter)
  LIMIT
    1) AS twitter_url,
  (
  SELECT
    url
  FROM
    UNNEST(ossd.github)
  LIMIT
    1) AS github_url,
  AllProjects.*,
FROM
  AllProjects
JOIN
  `ossd.projects` AS ossd
ON
  AllProjects.project_name = ossd.name
WHERE
  last_commit_date >= '2024-08-01'
  AND contributor_count_6_months BETWEEN 2
  AND 200
  AND star_count BETWEEN 5
  AND 5000
  AND num_onchain_txns_6M < 1000000
ORDER BY
  1