move-coop / parsons

A python library of connectors for the progressive community.
https://www.parsonsproject.org/
Other
260 stars 132 forks source link

Change dict to json #1028

Closed KasiaHinkson closed 2 months ago

KasiaHinkson commented 6 months ago

RECORD type requires the shape of the dictionary to be explicit and fails when the Parsons table has a dict value. I've tested with New/Mode data, and it loads successfully as JSON type.

austinweisgrau commented 6 months ago

Maybe if we look for any columns with dicts and convert them to JSON strings before sending them to bigquery? And keeping the column type defined as JSON

austinweisgrau commented 6 months ago

Orrrr change the way Python dicts are materialized to csvs to be json compliant or somethinf

KasiaHinkson commented 6 months ago

Thank you so much, I had a feeling I was missing something and couldn't make the time to more fully test it, so I super appreciate this. I like the idea of converting dicts to JSON strings

KasiaHinkson commented 6 months ago

Yeah, I knew something kind of funky was going on. I think I lean towards your second idea, it's more flexible and explicit.

KasiaHinkson commented 5 months ago

Ok, so what I've done is just json.dumps() all the dicts and lists, then in dbt I'm using PARSE_JSON. That felt easier in this case than specifying the entire table schema. If that feels like a reasonable expectation, then I think we should just take out the dict: RECORD, since that doesn't work, and add some documentation where it would fit best. What do you think?

austinweisgrau commented 5 months ago

Yes that sounds right to me

shaunagm commented 4 months ago

What's the status of this? From this conversation it sounds like you want to update the PR to remove "dict": "RECORD" instead of replacing it with "dict":"JSON" - lmk when that's done and I can approve and merge (or feel free to do something else if I've misunderstood, of course!)

KasiaHinkson commented 4 months ago

I actually think our conclusion was to take dict out entirely and add documentation that before running a copy function, users should convert dictionaries to json strings and then use PARSE_JSON in BQ. I'm not 100% sure where to write this documentation

austinweisgrau commented 4 months ago

Maybe if any column best type is a dict, we catch the key error and re-raise it with a more verbose description of the situation and how to address it?

austinweisgrau commented 4 months ago

Here's my suggestion of what we could do here: #1068

KasiaHinkson commented 2 months ago

Addressed by PR #1068

austinweisgrau commented 2 months ago

Noo @KasiaHinkson sorry - #1068 merged a change into THIS branch, not into main. This PR is now ready to be merged into main, not deleted.

austinweisgrau commented 2 months ago

Im going to merge though since it seems like we're in agreement that it's good to go.

KasiaHinkson commented 2 months ago

Ohhh yep, my apologies! No power for 2 days turns my brain off, apparently 😆