samelamin / spark-bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Apache License 2.0
70 stars 28 forks source link

change float to double type #59

Closed smdmts closed 6 years ago

smdmts commented 6 years ago

from https://github.com/samelamin/spark-bigquery/issues/57

Could not work with this change (https://github.com/samelamin/spark-bigquery/pull/59/commits/fcc905c5be5df71ecbdaffda4bc30a40a323859e) so I change all of the Float to Double type and it works fine. Would you mind sharing your thoughts about this?

samelamin commented 6 years ago

Can you change back the casting of float to float?

smdmts commented 6 years ago

Hi, @samelamin. I tried to float to float version (https://github.com/samelamin/spark-bigquery/commit/fcc905c5be5df71ecbdaffda4bc30a40a323859e), but it could not work well and became the same exception. I couldn't understand why is this behavior is right so please tell me your opinion.

I guess it should spark-avro pitfall with some Avro file's float value is processing by Double.

samelamin commented 6 years ago

Yeah I think it has to do with the avro issue

I think your initial commit was the correct thing to do

However if we merge this PR as it is, it will very likely cause someone an issue further down the line I do not want to close this because I appreciate your help.

Why dont you create a test that reads in an avro dataframe with a few records that are double and see if you can save them as Double in BQ?

Is it at all possible that the Avro file you are reading is corrupt? i.e part of the records are double and another part is a float?

Because essentially if we mark all doubles as doubles then why would the class try and convert it as a Float?

smdmts commented 6 years ago

@samelamin Thank you for your comment with care. I'll investigate more detail with this Avro issue and solve this problem.

samelamin commented 6 years ago

No problem. I might be wrong but it's best to to be sure and a quick test should be able to tell you that. You can even create a spark dataframe with some dummy values to be certain On Sat, 21 Apr 2018 at 06:44, Masatoshi Shimada notifications@github.com wrote:

@samelamin https://github.com/samelamin Thank you for your comment with care. I'll investigate more detail with this Avro issue and solve this problem.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/samelamin/spark-bigquery/pull/59#issuecomment-383269778, or mute the thread https://github.com/notifications/unsubscribe-auth/AEHLm2PT0OvS28_Jpq06O65T1ylsMk2Tks5tqscxgaJpZM4TbO_0 .