class cast exception has occurs (Double cannot be cast to Float)

smdmts commented 6 years ago

Hi, I'm trying to analyze firebase data using Spark with this spark-Bigquery. But class cast exception has occurred like Double cannot be cast to Float. Additionally, Double type exists in the Avro specs, but it seems only Float type casting in the module. (https://avro.apache.org/docs/1.8.1/spec.html)

code https://github.com/samelamin/spark-bigquery/blob/d45c73187392f12b09074248c3fcba9a1c40d436/src/main/scala/com/samelamin/spark/bigquery/converters/SchemaConverters.scala#L53

Would you mind tell me is this a bug?

https://support.google.com/firebase/answer/7029846

Error Detail

command

val df = spark.sqlContext.read.format("com.samelamin.spark.bigquery")
.option("tableReferenceSource","xxxx:yyy.app_events_intraday_20180417")
.load()
df.printSchema

output

root
|-- user_dim: struct (nullable = true)
|    |-- user_id: string (nullable = true)
|    |-- first_open_timestamp_micros: long (nullable = true)
|    |-- user_properties: array (nullable = true)
|    |    |-- element: struct (containsNull = true)
|    |    |    |-- key: string (nullable = true)
|    |    |    |-- value: struct (nullable = true)
|    |    |    |    |-- value: struct (nullable = true)
|    |    |    |    |    |-- string_value: string (nullable = true)
|    |    |    |    |    |-- int_value: long (nullable = true)
|    |    |    |    |    |-- float_value: float (nullable = true)
|    |    |    |    |    |-- double_value: float (nullable = true)
|    |    |    |    |-- set_timestamp_usec: long (nullable = true)
|    |    |    |    |-- index: long (nullable = true)
|    |-- device_info: struct (nullable = true)
|    |    |-- device_category: string (nullable = true)
|    |    |-- mobile_brand_name: string (nullable = true)
|    |    |-- mobile_model_name: string (nullable = true)
|    |    |-- mobile_marketing_name: string (nullable = true)
|    |    |-- device_model: string (nullable = true)
|    |    |-- platform_version: string (nullable = true)
|    |    |-- device_id: string (nullable = true)
|    |    |-- resettable_device_id: string (nullable = true)
|    |    |-- user_default_language: string (nullable = true)
|    |    |-- device_time_zone_offset_seconds: long (nullable = true)
|    |    |-- limited_ad_tracking: boolean (nullable = true)
|    |-- geo_info: struct (nullable = true)
|    |    |-- continent: string (nullable = true)
|    |    |-- country: string (nullable = true)
|    |    |-- region: string (nullable = true)
|    |    |-- city: string (nullable = true)
|    |-- app_info: struct (nullable = true)
|    |    |-- app_version: string (nullable = true)
|    |    |-- app_instance_id: string (nullable = true)
|    |    |-- app_store: string (nullable = true)
|    |    |-- app_platform: string (nullable = true)
|    |    |-- app_id: string (nullable = true)
|    |-- traffic_source: struct (nullable = true)
|    |    |-- user_acquired_campaign: string (nullable = true)
|    |    |-- user_acquired_source: string (nullable = true)
|    |    |-- user_acquired_medium: string (nullable = true)
|    |-- bundle_info: struct (nullable = true)
|    |    |-- bundle_sequence_id: long (nullable = true)
|    |    |-- server_timestamp_offset_micros: long (nullable = true)
|    |-- ltv_info: struct (nullable = true)
|    |    |-- revenue: float (nullable = true)
|    |    |-- currency: string (nullable = true)
|-- event_dim: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- date: string (nullable = true)
|    |    |-- name: string (nullable = true)
|    |    |-- params: array (nullable = true)
|    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |-- key: string (nullable = true)
|    |    |    |    |-- value: struct (nullable = true)
|    |    |    |    |    |-- string_value: string (nullable = true)
|    |    |    |    |    |-- int_value: long (nullable = true)
|    |    |    |    |    |-- float_value: float (nullable = true)
|    |    |    |    |    |-- double_value: float (nullable = true)
|    |    |-- timestamp_micros: long (nullable = true)
|    |    |-- previous_timestamp_micros: long (nullable = true)
|    |    |-- value_in_usd: float (nullable = true)

command

import org.apache.spark.sql.functions._
df.show

output

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 9, 10.228.249.82, executor 0): java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float

samelamin commented 6 years ago

Hi @smdmts correct, we should be casting float to float not double to float

Good find!

Feel free to send a pr in

michTalebzadeh commented 5 years ago

Hi, Using the below in Scala code import com.samelamin.spark.bigquery._

I have a Hive table imported to BigQuery through avro file and table is created in BQ as follows

It is pretty simple. The code tries to load this table first

`//read data from BigQuery Table println("\nreading data from " + fullyQualifiedInputTableId)

val df = spark.sqlContext .read .format("com.samelamin.spark.bigquery") .option("tableReferenceSource",fullyQualifiedInputTableId) .load()

df.printSchema

// create a temporary view on DF df.createOrReplaceTempView ("tmp") ` OK this is the output

reading data from axial-glow-224522:accounts.ll_18201960 root |-- transactiondate: string (nullable = true) |-- transactiontype: string (nullable = true) |-- sortcode: string (nullable = true) |-- accountnumber: string (nullable = true) |-- transactiondescription: string (nullable = true) |-- debitamount: float (nullable = true) |-- creditamount: float (nullable = true) |-- balance: float (nullable = true)

The tmp view is created. However, when trying to read debitamount defined as float, I am getting the following error

spark.sql("select transactiondate,transactiontype, sortcode, accountnumber, transactiondescription, debitamount from tmp").collect.foreach(println)

18/12/27 19:41:59 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, rhes77-cluster-w-1.europe-west2-a.c.axial-glow-224522.internal, executor 1): java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float at scala.runtime.BoxesRunTime.unboxToFloat(BoxesRunTime.java:109) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getFloat(rows.scala:43) at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getFloat(rows.scala:195) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Any workaround on this if exists please!

Thanks,

Mich

michTalebzadeh commented 5 years ago

Hi,

I now have a work-around for this issue using Spark DF transformation to cast date from String to Date and String to Double where appropriate and then save the data in BigQuery table.

Let me know your thoughts.

Thanks

samelamin / spark-bigquery

class cast exception has occurs (Double cannot be cast to Float) #57

Error Detail