near / near-public-lakehouse

NEAR Public Lakehouse
GNU General Public License v3.0
3 stars 0 forks source link

near-public-lakehouse

NEAR Public Lakehouse

This repository contains the source code for ingesting NEAR Protocol data stored as JSON files in AWS S3 by near-lake-indexer. The data is loaded in a streaming fashion using Databricks Autoloader into raw/bronze tables, and transformed with Databricks Delta Live Tables streaming jobs into cleaned/enriched/silver tables.

The silver tables are also copied into the GCP BigQuery Public Dataset.

Intro

Blockchain data indexing in NEAR Public Lakehouse is for anyone who wants to make sense of blockchain data. This includes:

Benefits:

Architecture

Architecture Note: Databricks Medallion Architecture

What is NEAR Protocol?

NEAR is a user-friendly, carbon-neutral blockchain, built from the ground up to be performant, secure, and infinitely scalable. It's a layer one, sharded, proof-of-stake blockchain designed with usability in mind. In simple terms, NEAR is blockchain for everyone.

Data Available

The current data that we are providing was inspired by near-indexer-for-explorer. We plan to improve the data available in the NEAR Public Lakehouse making it easier to consume by denormalizing some tables.

The tables available in the NEAR Public Lakehouse are:

Examples

SELECT
  r.block_date collected_for_day,
  COUNT(DISTINCT r.transaction_signer_account_id)
FROM `bigquery-public-data.crypto_near_mainnet_us.receipt_actions` ra
  INNER JOIN `bigquery-public-data.crypto_near_mainnet_us.receipts` r ON r.receipt_id = ra.receipt_id
WHERE ra.action_kind = 'FUNCTION_CALL'
  AND ra.receipt_receiver_account_id = 'near.social' -- change to your contract
GROUP BY 1
ORDER BY 1 DESC;

How to get started?

  1. Login into your Google Cloud Account.
  2. Open the NEAR Protocol BigQuery Public Dataset.
  3. Click in the VIEW DATASET button.
  4. Click in the "+" to create a new tab and write your query, click in the "RUN" button, and check the "Query results" below the query.
  5. Done :)

How much it costs?

Note: You can check how much data it will query before running it in the BigQuery console UI. Again, since BigQuery uses a columnar data structure and partitions, it's recommended to select only the columns and partitions (block_date) needed to avoid unnecessary query costs.

Query Costs

References