Closed warrensearle closed 4 years ago
There are two types of metrics we want to be able to look at:
Google BigQuery
BigQuery essentially takes a heap of data in and allows you to run large queries on it. It integrates with other tools (Data Studio, Looker) which can make pretty graphs for queries.
My experience of BigQuery has been as a business intelligence tool (coupled with working out which users are affected by incident X, Y or Z).
1. Service performance metrics
We rely on Firebase to provide these. Whilst Firebase has tools to measure frontend performance (Crashlytics, Firebase Performance Monitoring), it has essentially next to no platform metrics which we can extract.
There is a wider discussion about whether Firebase is the right platform for us to continue to grow on. We need to consider whether we are happy with the black box "it just scales" approach, or whether we need more control and data to be satisfied about service health.
I would suggest BigQuery is not the tool we want to use for this anyway, as metrics are time series data and therefore something like InfuxDB or Prometheus is a better database for managing and querying that kind of data. Other cloud providers have their own tools, eg. AWS CloudWatch.
2. Business data analytics
This should be really easy! Firestore and BigQuery are both Google platforms, and they share a data transport format.
There is a Google tool to update and record changes to data in BigQuery in your Firestore collections. Sadly, there is a well documented flaw in this tool from September 2019 meaning wildcards don't work properly (so we could only auto-update 1 out of ~25 collections). There is a PR open to provide a fix, but there seems to be little Google impetus to actually get it merged...
Other tools (eg) do exist to do this and we could script our own functions to update collections in BigQuery. However, even within the tool there are workarounds: eg. an "update" actually means deleting the tables and then adding them again. This might cause issues down the line, and we're really just looking at workaround on top of workaround.
Another option is to simply manually download the dumps from Firebase and upload them to BigQuery...
Thoughts
I think this work is effectively blocked for the moment. We should look at this as a requirement in discussions over which cloud platform we want to continue with. I think we need to separate the concepts of platform metrics from business intelligence and marketing metrics (like Google Analytics) - they are all separate areas and need specific tools.
A Proof of Concept report has been created in Data Studio via exporting from Firestore and importing into Big Query.
Export collections from Firestore
gcloud firestore export gs://#BUCKET#/#PATH# --collection-ids=#COLLECTIONS_CSV#
Import single collection into a table in Big Query (one command per collection)
bq load --source_format=DATASTORE_BACKUP #DATASTORE#.#TABLE# gs://#BUCKET#/#PATH#/all_namespaces/kind_#COLLECTION#/all_namespaces_kind_#COLLECTION#.export_metadata
Closing this ticket as it's now completed.
Google Firebase has native integration with Big Query and Data Studio.
Let's connect them up and create an initial report/dashboard or two so we can see what's possible and check suitability.