kernelci / kcidb

kernelci.org common database tools
GNU General Public License v2.0
27 stars 33 forks source link

bigquery: Consider flattening data structures #212

Open spbnick opened 3 years ago

spbnick commented 3 years ago

Right now we're using RECORD types in our BigQuery schema, and we also use arrays. However, that doesn't work well with all BigQuery features. For example, materialized views don't support structure (RECORD) arguments in ANY_VALUE aggregations, and arrays, which don't support equality operations, don't work with UNION DISTINCT.

Consider flattening our data structures into JSON fields in BigQuery, similarly to what we do in SQLite. We can still extract the data from them server-side with JSON functions (although not so efficiently).

Forked off #183.

spbnick commented 2 years ago

We need to find out whether we should actually do this or not. For example, we might find out that flattened structure would make some (existing or near-future) queries difficult, or impossible. Or we might find out after flattening the structure, that there are other issues preventing the use of materialized views and UNION DISTINCT. Or, finally, we might find out that materialized views don't bring much of a performance benefit, and we can survive without UNION DISTINCT, making the whole effort not worth it. We need to answer these questions (and hunt for more pros and cons), before starting seriously implementing this.

One argument for this change, at least to some extent, could be lack of feature parity between BigQuery and e.g. PostgreSQL as regards to structures and arrays in table columns. If we find out we need PostgreSQL support (#209) after all, it might make the whole implementation simpler (due to more uniformity), if they both had flattened fields. We shouldn't forget about the SQLite driver either, which is very low on features too, so maybe that would be enough of a reason.

spbnick commented 2 years ago

If we figure out we need this after all, here are the possible steps for implementing this: