near / data-platform

Near.org Data Platform Initiatives and Epics: Recommendations, Search, Analaytics
1 stars 0 forks source link

NEARCon Qualifying Apps Data Pipeline #32

Closed pkudinov closed 2 months ago

pkudinov commented 10 months ago

Build a Databricks Pipeline to exports a JSON file in S3 a list of dApp that has description + tags that contains (Earn, Play, Develop, Engage) in Metadata field. Frequency: hourly

eduohe commented 10 months ago

Research findings:

1) I did some analysis on the https://near.org/near/widget/ComponentsPage?tab=apps page source code to try to understand how this page has arrived in the 243 apps:

2) I talked with Oksana and Jo how they are extracting metrics for the NEAR Social data in Databricks. The table used is mainnet.silver_near_social_txs_parsed after their help we got the following query that returns 236 apps after talk with Tiffany we decide to use this data.

WITH widget_name as (
SELECT 
  block_date,
  block_timestamp_utc,
  true_signer_id account_id,
  explode(json_object_keys(widget)) widget_name,
  widget
FROM mainnet.silver_near_social_txs_parsed
WHERE widget is not null
)
, widget_code as (
  SELECT 
    block_date,
    block_timestamp_utc,
    account_id,
    widget_name,
    get_json_object(widget, concat(concat('$["', widget_name), '"]')) widget_code
FROM widget_name

)
, tag_app as (
  select 
    *
    , row_number() OVER (PARTITION BY (account_id, widget_name) ORDER BY block_timestamp_utc DESC) rn
  FROM widget_code 
  where contains(widget_code:metadata, '"app":')
)
, metadata as (
SELECT 
  block_timestamp_utc last_updated_at_utc,
  account_id,
  concat(concat(account_id, '/widget/'), widget_name) widget_name,
  widget_code:metadata:tags tags,
  widget_code:metadata:description description, 
  widget_code:metadata:tags:app,
  widget_code:metadata
FROM tag_app
WHERE rn = 1
)

SELECT  
  last_updated_at_utc,
  account_id,
  widget_name,
  tags,
  description, 
  metadata
FROM metadata 
WHERE true
AND app is not null
ORDER BY widget_name
eduohe commented 10 months ago

Done 🟢

Notebook: https://4221960800361869.9.gcp.databricks.com/?o=4221960800361869#notebook/4382660343742058/command/4382660343742061 Scheduled to run every hour

Gold table:

SELECT count(*) FROM hive_metastore.mainnet.gold_nearcon_apps

The apps qualified list is written in this GCS file: gs://databricks-near-query-runner/output/nearcon_apps/apps_qualified.json

Sample content of https://storage.cloud.google.com/databricks-near-query-runner/output/nearcon_apps/apps_qualified.json:

{
  "data": [
    "{\"last_updated_at_utc\":\"2023-03-02T21:33:35.649Z\",\"account_id\":\"devgovgigs.near\",\"widget_name\":\"devgovgigs.near/widget/gigs-board.pages.Feed\",\"tags\":\"{\\\"developer-governance\\\":\\\"\\\",\\\"app\\\":\\\"\\\"}\",\"description\":\"A place for ideas, where they are proposed, discussed, validated, sponsored, and implemented.\",\"metadata\":\"{\\\"description\\\":\\\"A place for ideas, where they are proposed, discussed, validated, sponsored, and implemented.\\\",\\\"image\\\":{\\\"ipfs_cid\\\":\\\"bafkreifynesvgrnghleag5c7vgshkp7h4uegzwdhjyjqqans4t3hdpuhii\\\"},\\\"name\\\":\\\"Developer Governance Gigs\\\",\\\"tags\\\":{\\\"developer-governance\\\":\\\"\\\",\\\"app\\\":\\\"\\\"}}\"}"
  ],
  "executed_at": 1697754673126
}
github-actions[bot] commented 2 months ago

This issue has been automatically closed because it has been inactive for more than 180 days. Please reopen and prioritize this for development if it is essential.