spotify / spark-bigquery

Google BigQuery support for Spark, SQL, and DataFrames
Apache License 2.0
155 stars 52 forks source link

Enable user to specify write options, hinting about issues with nested records #59

Closed jobegrabber closed 6 years ago

jobegrabber commented 6 years ago

Writing nested records to BigQuery with the current implementation was not possible: BigQuery is seemingly unable to load Avro Namespaces with a leading dot (.nestedColumn) on nested records.

spark-bigquery utilizes temporary Avro files written to Cloud Storage. This PR enables users to specify write options such as setting the Avro Namespace which will enable users to write nested records to BigQuery.

Additionally, this PR adds a hint and link to how BigQuery handles the import of Avro files (for example, arrays of arrays are not supported).