samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Apache License 2.0
70 stars 28 forks source link

saveAsBigQueryTable: Schemas dont match? #11

Closed kurtmaile closed 7 years ago

kurtmaile commented 7 years ago

Hi,

Thanks for your help on the previous issue.

Im getting the error below when trying to save a very simple dataframe (4 strings and 1 boolean - simple!) into a bigQuery table with matching column names, types and even the order (does order of dFrame cols need ot match order for bquery table? I assume not, but have the same order anyway).

dframe.flattenedEvent.saveAsBigQueryTable(...)

its such a simple example but cannot get it to work and get the error below - any ideas? I can read from othe tables, its just the writting/saving thats the issue.

java.io.IOException: Provided Schema does not match Table **:sdv.event at com.google.cloud.hadoop.io.bigquery.BigQueryUtils.waitForJobCompletion(BigQueryUtils.java:95) at com.samelamin.spark.bigquery.BigQueryClient.com$samelamin$spark$bigquery$BigQueryClient$$waitForJob(BigQueryClient.scala:143) at com.samelamin.spark.bigquery.BigQueryClient.load(BigQueryClient.scala:110) at com.samelamin.spark.bigquery.package$BigQueryDataFrame.saveAsBigQueryTable(package.scala:163) .......

Do you actually have a sample public dbricks notebook by any chance you can share? No probs if not

Cheers and thanks Kurt

samelamin commented 7 years ago

Hey Kurt, Seems the schemas do not match. If I were you I would try to save to a duplicate table and compare the schemas

It might be that there is a bug when inferring the schema because BigQuery Schemas do not match the Spark Struct Schema exactly

samelamin commented 7 years ago

@kurtmaile do you still require help in this?

kurtmaile commented 7 years ago

Hey thanks 'Ill close it and try what you suggested cheers!

samelamin commented 7 years ago

Do you actually have a sample public dbricks notebook by any chance you can share? No probs if not @kurtmaile this is exactly why I am working on https://github.com/samelamin/docker-zeppelin, feel free to send a pr for a notebook that you might find beneficial

There are various public github datasets