spotify / scio

A Scala API for Apache Beam and Google Cloud Dataflow.
https://spotify.github.io/scio
Apache License 2.0
2.55k stars 514 forks source link

Support BigQuery JSON column type #5416

Closed turb closed 2 months ago

turb commented 2 months ago

Adds support for the JSON column type on BigQuery.

This mimics what has been done for GEOGRAPHY: use a simple case class Json(wkt: String) container. There are a couple of changes I have copied without knowing their use, so they may have none (like in StorageUtil).

An alternative may be to store it into a Json parser implementation model (eg Jackson), however it would tie it to more complex things.

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 50.00000% with 1 line in your changes missing coverage. Please review.

Project coverage is 61.23%. Comparing base (c4d4554) to head (8ca8be7).

Files Patch % Lines
.../scala/com/spotify/scio/bigquery/StorageUtil.scala 0.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #5416 +/- ## ========================================== - Coverage 61.24% 61.23% -0.01% ========================================== Files 310 310 Lines 11058 11060 +2 Branches 751 736 -15 ========================================== + Hits 6772 6773 +1 - Misses 4286 4287 +1 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

RustedBones commented 2 months ago

Thank you!

turb commented 2 months ago

It seems something does not work as expected with REPEATED JSON: I am getting two lines for the same entry, the JSON repeated data being splited between the two...

RustedBones commented 2 months ago

I'm not sure I get the end state, can you give an example ?

turb commented 2 months ago

Nevermind, I did some tests and could not reproduce it. I'll open a bug if I find it back, it may not be related.