Open lukeasrodgers opened 1 year ago
I faced the same Issue
We should update the jsonschema type translation to properly handle object types. We inherited the translation from the old target bigquery.
The only tricky part is that some load methods will load json from a complex object, other methods need the contents of the key to be stringified.
I'm facing the same issue. Any update or fix on this? Thanks!
After investigating, I believe the main issue stems from the JSON type column being recognized as both object
and array
by tap-postgres
(source).
This ambiguity leads to a conflict in our jsonschema type translation, which checks array
first, resulting in the JSON field being treated with a REPEATED mode.
Therefore, I suggest modifying the fallback SchemaField of array
in target-bigquery from REPEATED mode to NULLABLE. This change makes sense because BigQuery's JSON already supports arrays, and JSON fields can handle semi-structured loading as outlined here.
What are your thoughts on this, @z3z1ma ?
I am trying to use this target (via meltano, using most recent releases of both) to process data from postgres (https://github.com/transferwise/pipelinewise-target-postgres), specifically from a table that has two JSON columns,
object
andobject_changes
. The JSON in these columns will always be an object, but otherwise the structure can vary drastically.I've tried to get this table loaded into bigquery using both the
denormalized
andgenerate_view
options separately, but both approaches fail in different ways. If I set bothdenormalized
andgenerate_view
tofalse
, I'm able to load the data fine. I can of course generate my own view, or transform the data in dbt downstream, but would really like to figure out what the issue is here.When using
generate_view
I get this failure:When using
denormalized
I get this failure:I have deleted the bigquery table and meltano state/runs data between invocations, for a clean slate.
I may be misunderstanding the limits of these two different approaches, but it does seem like it should be possible to load postgres JSON columns -> bigquery JSON columns.
Perhaps relatedly, when using
denormalized
the schema that is created in bigquery specifies the mode of both the JSON columns asREPEATED
when I think it should beNULLABLE
, but see no way to configure this. I manually edited the schema in bigquery, changing their mode toNULLABLE
but this didn't fix the problem this loader is encountering - probably because the issue is between this library and the protobuf library, not specifically when hitting the bigquery API.