samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Apache License 2.0
70 stars 28 forks source link

Bug: Null type not supported error when setting a null value to a DF column #38

Closed kurtmaile closed 7 years ago

kurtmaile commented 7 years ago

Hi Sam,

I think this might be a bug.

Im migrating some existing data in SparkSQL, and if I want to set a null on a dataframe column due to some conditions in the data, I cannot use null explicitly.

e.g

null AS lastUpdateEventId

I get a 'NullType not supported' error with the schema converter, seems it doent like this

If then I change this to an empty string, it does work

e.g

"" AS AS lastUpdateEventId

All works fine. I can live with the empty string but thought to raise this.

My pipeline is much more stable, up for a week continuously now. Thanks soooo much for your help mate! :)

K

kurtmaile commented 7 years ago

This might not be able to be supported as the schema converter would not know what the type is as null could be anything. This is happening if I set null for all values of a column (spark defaults to a string type for its inference though) - reason being historical data being migrated I want to have the same schema as the new, but a given value sometimes is just not determinable.....

Anyhow, can just set to "" for now as I know its a string value. I havent run into any struct ones. Its just a poroperty of legacy migration I guess

Cheers!

samelamin commented 7 years ago

Hey @kurtmaile Just got back so going through the backlog of things. If I am understanding what you are saying then you need to define a schema type for a column. In other words while Spark does have a "NullType", BQ does not. My only suggestion is to assign a column a type which you are already doing, for complicated types I suggest StructType which will translate as a record