We want to build an automated ETL process with the following characteristics:
A client pushes data in the form of CSV or JSON documents into a Google Cloud Storage Bucket.
The bucket is configured so that it publishes an event when new documents are stored.
A Google App Engine microservice consumes the event.
In response to the event, the microservice loads the data from Cloud Storage into a Cloud SQL table.
The microservice runs a SQL script to transform the data into a new shape, and stores the transformed data in a new table using a SELECT INTO statement.
The microservice exports the transformed data to another Cloud Storage Bucket, which is configured to publish a change event.
Another microservice consumes this change event and loads the transformed data from the Cloud Storage Bucket into Google BigQuery.
To support this effort, we should start by hand crafting a prototype that implements the above flow without any automation. The purpose of this exercise is to get familiar with the Google Cloud Platform components and how to use it.
Here's a diagram that illustrates the design pattern we eventually want to automate.
We want to build an automated ETL process with the following characteristics:
To support this effort, we should start by hand crafting a prototype that implements the above flow without any automation. The purpose of this exercise is to get familiar with the Google Cloud Platform components and how to use it.
Here's a diagram that illustrates the design pattern we eventually want to automate.