Closed KasiaHinkson closed 2 months ago
Maybe if we look for any columns with dicts and convert them to JSON strings before sending them to bigquery? And keeping the column type defined as JSON
Orrrr change the way Python dicts are materialized to csvs to be json compliant or somethinf
Thank you so much, I had a feeling I was missing something and couldn't make the time to more fully test it, so I super appreciate this. I like the idea of converting dicts to JSON strings
Yeah, I knew something kind of funky was going on. I think I lean towards your second idea, it's more flexible and explicit.
Ok, so what I've done is just json.dumps() all the dicts and lists, then in dbt I'm using PARSE_JSON. That felt easier in this case than specifying the entire table schema. If that feels like a reasonable expectation, then I think we should just take out the dict: RECORD
, since that doesn't work, and add some documentation where it would fit best. What do you think?
Yes that sounds right to me
What's the status of this? From this conversation it sounds like you want to update the PR to remove "dict": "RECORD" instead of replacing it with "dict":"JSON" - lmk when that's done and I can approve and merge (or feel free to do something else if I've misunderstood, of course!)
I actually think our conclusion was to take dict
out entirely and add documentation that before running a copy function, users should convert dictionaries to json strings and then use PARSE_JSON in BQ. I'm not 100% sure where to write this documentation
Maybe if any column best type is a dict, we catch the key error and re-raise it with a more verbose description of the situation and how to address it?
Here's my suggestion of what we could do here: #1068
Addressed by PR #1068
Noo @KasiaHinkson sorry - #1068 merged a change into THIS branch, not into main. This PR is now ready to be merged into main, not deleted.
Im going to merge though since it seems like we're in agreement that it's good to go.
Ohhh yep, my apologies! No power for 2 days turns my brain off, apparently 😆
RECORD type requires the shape of the dictionary to be explicit and fails when the Parsons table has a dict value. I've tested with New/Mode data, and it loads successfully as JSON type.