jac-uk / digital-platform

Firebase/GCP configuration and components for our digital platform
MIT License
2 stars 1 forks source link

Proof of Concept. Integrate our data with BigQuery and use Data Studio to create a report/dashboard #264

Closed warrensearle closed 4 years ago

warrensearle commented 4 years ago

Google Firebase has native integration with Big Query and Data Studio.

Let's connect them up and create an initial report/dashboard or two so we can see what's possible and check suitability.

crablab commented 4 years ago

There are two types of metrics we want to be able to look at:

  1. Service performance metrics (function requests per second, function error rates, function latency, database hits etc). These are system metrics and would come directly from the infrastructure we are running on.
  2. Business data analytics (applications created over time, diversity data etc). These are queries defined by business questions, which need to run on our database. Because Firestore isn't designed (and wouldn't cope well!) with running large aggregate queries, we need to use a "data lake" type service to manage this data.

Google BigQuery

BigQuery essentially takes a heap of data in and allows you to run large queries on it. It integrates with other tools (Data Studio, Looker) which can make pretty graphs for queries.

My experience of BigQuery has been as a business intelligence tool (coupled with working out which users are affected by incident X, Y or Z).

1. Service performance metrics

We rely on Firebase to provide these. Whilst Firebase has tools to measure frontend performance (Crashlytics, Firebase Performance Monitoring), it has essentially next to no platform metrics which we can extract.

There is a wider discussion about whether Firebase is the right platform for us to continue to grow on. We need to consider whether we are happy with the black box "it just scales" approach, or whether we need more control and data to be satisfied about service health.

I would suggest BigQuery is not the tool we want to use for this anyway, as metrics are time series data and therefore something like InfuxDB or Prometheus is a better database for managing and querying that kind of data. Other cloud providers have their own tools, eg. AWS CloudWatch.

2. Business data analytics

This should be really easy! Firestore and BigQuery are both Google platforms, and they share a data transport format.

There is a Google tool to update and record changes to data in BigQuery in your Firestore collections. Sadly, there is a well documented flaw in this tool from September 2019 meaning wildcards don't work properly (so we could only auto-update 1 out of ~25 collections). There is a PR open to provide a fix, but there seems to be little Google impetus to actually get it merged...

Other tools (eg) do exist to do this and we could script our own functions to update collections in BigQuery. However, even within the tool there are workarounds: eg. an "update" actually means deleting the tables and then adding them again. This might cause issues down the line, and we're really just looking at workaround on top of workaround.

Another option is to simply manually download the dumps from Firebase and upload them to BigQuery...

Thoughts

I think this work is effectively blocked for the moment. We should look at this as a requirement in discussions over which cloud platform we want to continue with. I think we need to separate the concepts of platform metrics from business intelligence and marketing metrics (like Google Analytics) - they are all separate areas and need specific tools.

warrensearle commented 4 years ago

A Proof of Concept report has been created in Data Studio via exporting from Firestore and importing into Big Query.

Export collections from Firestore

gcloud firestore export gs://#BUCKET#/#PATH# --collection-ids=#COLLECTIONS_CSV#

Import single collection into a table in Big Query (one command per collection)

bq load --source_format=DATASTORE_BACKUP #DATASTORE#.#TABLE# gs://#BUCKET#/#PATH#/all_namespaces/kind_#COLLECTION#/all_namespaces_kind_#COLLECTION#.export_metadata
warrensearle commented 4 years ago

Closing this ticket as it's now completed.