Open spbnick opened 3 years ago
We need to find out whether we should actually do this or not. For example, we might find out that flattened structure would make some (existing or near-future) queries difficult, or impossible. Or we might find out after flattening the structure, that there are other issues preventing the use of materialized views and UNION DISTINCT. Or, finally, we might find out that materialized views don't bring much of a performance benefit, and we can survive without UNION DISTINCT, making the whole effort not worth it. We need to answer these questions (and hunt for more pros and cons), before starting seriously implementing this.
One argument for this change, at least to some extent, could be lack of feature parity between BigQuery and e.g. PostgreSQL as regards to structures and arrays in table columns. If we find out we need PostgreSQL support (#209) after all, it might make the whole implementation simpler (due to more uniformity), if they both had flattened fields. We shouldn't forget about the SQLite driver either, which is very low on features too, so maybe that would be enough of a reason.
If we figure out we need this after all, here are the possible steps for implementing this:
RECORD
and ARRAY
column types to STRING
holding JSON.UNION
to using UNION DISTINCT
in BigQuery driver queries.
Right now we're using
RECORD
types in our BigQuery schema, and we also use arrays. However, that doesn't work well with all BigQuery features. For example, materialized views don't support structure (RECORD
) arguments in ANY_VALUE aggregations, and arrays, which don't support equality operations, don't work withUNION DISTINCT
.Consider flattening our data structures into JSON fields in BigQuery, similarly to what we do in SQLite. We can still extract the data from them server-side with JSON functions (although not so efficiently).
Forked off #183.