hashgraph / hedera-mirror-node

Hedera Mirror Node archives data from consensus nodes and serves it via an API
Apache License 2.0
147 stars 111 forks source link

Deploy Blockchain ETL #747

Closed apeksharma closed 4 years ago

apeksharma commented 4 years ago

Components:

Resources needed:

  1. BigQuery tables : transactions, errors, dedupe_state
  2. PubSub topic for transactions
  3. GCS bucket : Used for dataflow templates, staging and as temp location
  4. ETL Pipeline from PubSub to BigQuery:
    • PubSub subscription
    • Service account with following roles: BigQuery Data Editor, Dataflow Worker, Pub/Sub Subscriber, and Storage Admin
  5. Deduplication Task
    • Service account with following roles: BigQuery Data Editor, BigQuery Job User, Monitoring Metric Writer
  6. Mirror Importer
    • Service account with following roles: PubSub Publisher
  7. (Optional) ETL Pipeline from PubSub to GCS
    • GCS Bucket: For output of pipeline
    • Service account with following roles: Dataflow Worker, Pub/Sub Editor (for creating subscription), and Storage Admin
apeksharma commented 4 years ago

Did initial setup. See https://github.com/blockchain-etl/hedera-etl/blob/master/docs/deployment.md for list of resources. Deployed Importer (source: mainnet aws bucket; gcp RP was creating issues). Deployed Dataflow pipelines. This deployment is labelled 'test'. Would do clean one for 'prod' later this week.