Open relud opened 4 years ago
If we do receive null elements during ingestion and we are not treating them as validation errors, then we would have to strip the null elements from the list as they are being transformed into a BigQuery row.
If we do receive null elements during ingestion and we are not treating them as validation errors
which we don't have to worry about, because we decided we should treat them as validation errors for now.
then we would have to strip the null elements from the list as they are being transformed into a BigQuery row
or we would have to nest elements in a struct and null the field, like parquet imports.
removing nulls would cause issues if element indexes matter and we are only preserving array order. nesting elements would cause issues because it would change the schema of all our tables. nesting elements only when they are nullable and not already structs would cause issues if a schema were modified to become nullable, because normally that would be a backwards compatible change.
those are the issues that led us to decide not to support nullable array elements for now.
jsonschema can allow array elements to be
null
, but BigQuery can only make fieldsREPEATED
orNULLABLE
.BigQuery parquet imports solve this by wrapping both the array and elements in structs, so that an array is transformed into a struct with one repeated field called
list
containing structs with one fieldelement
and both the outer struct and theelement
field can beNULLABLE
whilelist
isREPEATED
.The transpiler currently converts an array of nullable elements in a jsonschema to a
REPEATED
field in a BigQuery schema which cannot containNULL
. For example{"properties":{"mylist":{"items":{"type":["integer","null"]},"type":"array"}},"type":"object"}
->[{"mode":"REPEATED","name":"mylist","type":"INT64"}]
. This causes issues where if a jsonschema allows a message that BigQuery rejects during a file load operation, the whole file is rejected.This was discussed in the GCP Technical check-in on 2019-09-30 where it was determined that at this time due to backwards compatibility constraints the transpiler should error if schemas allow nullable array elements and mozilla-pipeline-schemas CI should fail if the transpiler can't transform schemas.