Closed krauthex closed 5 years ago
So for the table header we still need to define the actual nested structure for nested fields :neutral_face: probably so for now I can create something, but as soon as we know the actual output, we need to change that...
To give a short example of how this should look like:
from google.cloud.bigquery import SchemaField
# Schema
hermesSchema = [
SchemaField('ID', 'STRING', mode='REQUIRED'),
SchemaField('timestamp', 'TIMESTAMP', mode='REQUIRED'),
SchemaField('articleContent', 'RECORD', mode='NULLABLE',
fields=(SchemaField('title', 'STRING'),
SchemaField('author', 'STRING'),
SchemaField('date', 'DATETIME'),
SchemaField('body', 'STRING'))
)]
So we do need to have a quite precise understanding of
Feed Firebase content into BigQuery
Description
The Firebase realtime DB works as a temporary database for the newest content (e.g. the last week/month), because it's easier to detect and prevent duplicates in Firebase. BigQuery is the archive, so every newly created entry in Firebase gets archived into BQ for later use.
Proposal
A python script running in GCP using a Cloud Function to be triggered by a
write
,update
and/orcreate
event from the Realtime Database, that seamlessly pours the data into BQ.Updated Proposal with subtasks:
schema.py
delta = {"<deletedNode>": null}
which should be caught. (null
converts toNone
in Python)How to test the implementation?
Add something in Firebase and see if the new entries show up in BQ.
Relate issue
2