opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
71 stars 16 forks source link

Generate starter notebook and walkthrough for ThankArb analysis #2131

Closed ccerv1 closed 3 weeks ago

ccerv1 commented 3 weeks ago

What is it?

Help @rohitmalekar get going with querying OSO data and creating some initial analysis.

ccerv1 commented 3 weeks ago

Notebook:

Outline: How to Set Up and Analyze Data in a Notebook with Open Source Observer Data

  1. Getting Started

    • Introduction to creating an analysis notebook with open source observer data.
    • Using an existing collection as an example due to the new collection ThankARB not yet existing.
  2. Setting Up Connection

    • Steps to connect to Google Cloud and BigQuery for data access.
    • Options: Use Jupyter notebook locally or Google Colab for cloud-based access.
    • Demonstration of connecting by storing Google credentials locally.
  3. Configurations

    • Creating basic configurations for the project and dataset.
    • Recommendation to start with the OSO Playground dataset for exploration to avoid excess data scanning costs.
  4. Query Testing

    • Testing queries directly in BigQuery SQL Explorer.
    • Example query on the OSO artifacts table and the OSO Playground dataset for comparison.
  5. Data Filtering

    • Filtering artifacts by project and project collections.
    • Filtering artifact sources, e.g., GitHub, to narrow down the data.
  6. Activity Metrics

    • Converting list of project IDs into arguments for SQL queries.
    • Retrieving code metrics for projects in the list.
  7. Timescale Metrics

    • Constructing a query to retrieve time series metrics data.
    • Joining metrics data with artifact names for better understanding.
  8. Data Visualization

    • Generating tables and charts from the data.
    • Creating a project snapshot table and an area chart for time series metrics using Plotly.
  9. Exporting Results

    • Generating a markdown export for creating a report with a formatted table.
    • Saving the analysis in a repo for future reference.
  1. Conclusion

    • Recap of the process and the availability of the analysis in the specified repo.
    • Thanking the audience for their attention.

Link to Loom

https://www.loom.com/share/3a28dfb93b1143f2a3d4153f6dd6c450

ccerv1 commented 3 weeks ago

Sample table export

project_name display_name star_count contributor_count
web3 web3.js 19180 2333
ethereum-cat-herders Ethereum Cat Herders 12819 1559
buidlguidl Buidl Guidl 12718 878
tor-project The Tor Project 6479 728
vyperlang Vyper 5249 478
web3py-ethereum web3.py 4949 961
blockscout Blockscout 3914 2429
rotki rotki 2824 513
dappnode DAppNode 1092 386
1hive 1Hive Gardens 878 325
ethstaker EthStaker 859 240
l2beat L2BEAT 858 306
revoke-cash Revoke 764 169
ethereum-attestation-service Ethereum Attestation Service 423 100
nicenode NiceNode 197 23
opensource-observer Open Source Observer 142 109
hypercerts Hypercerts 104 30
shutter-network Shutter Network 102 26
pairwise-general-magic Pairwise 46 21
givepraise Praise 43 29
protocol-guild Protocol Guild 40 92
growthepie growthepie 28 10
pizzadao PizzaDAO 27 32
dao-drops-dorgtech DAO Drops 14 4
ecosynthesisx EcoSynthesisX 13 6
trustful-blockful-io Trustful 7 6
rndao RnDAO 5 20
abundanceprotocol Abundance Protocol 3 1
fundingthecommons fundingthecommons 1 9
desci-latam DeSci LATAM 0 3