konigio / konig

A collection of tools for semantic modeling
4 stars 4 forks source link

Spike: Event Driven ETL #677

Closed gmcfall closed 6 years ago

gmcfall commented 6 years ago

We want to build an automated ETL process with the following characteristics:

  1. A client pushes data in the form of CSV or JSON documents into a Google Cloud Storage Bucket.
  2. The bucket is configured so that it publishes an event when new documents are stored.
  3. A Google App Engine microservice consumes the event.
  4. In response to the event, the microservice loads the data from Cloud Storage into a Cloud SQL table.
  5. The microservice runs a SQL script to transform the data into a new shape, and stores the transformed data in a new table using a SELECT INTO statement.
  6. The microservice exports the transformed data to another Cloud Storage Bucket, which is configured to publish a change event.
  7. Another microservice consumes this change event and loads the transformed data from the Cloud Storage Bucket into Google BigQuery.

To support this effort, we should start by hand crafting a prototype that implements the above flow without any automation. The purpose of this exercise is to get familiar with the Google Cloud Platform components and how to use it.

Here's a diagram that illustrates the design pattern we eventually want to automate. edw cdw ignite onedata adp - claim check sequence

gmcfall commented 6 years ago

This issue is no longer relevant. We are moving the SofA to AWS