Closed smdmts closed 6 years ago
Hi @smdmts correct, we should be casting float to float not double to float
Good find!
Feel free to send a pr in
Hi, Using the below in Scala code import com.samelamin.spark.bigquery._
I have a Hive table imported to BigQuery through avro file and table is created in BQ as follows
It is pretty simple. The code tries to load this table first
`//read data from BigQuery Table println("\nreading data from " + fullyQualifiedInputTableId)
val df = spark.sqlContext .read .format("com.samelamin.spark.bigquery") .option("tableReferenceSource",fullyQualifiedInputTableId) .load()
df.printSchema
// create a temporary view on DF df.createOrReplaceTempView ("tmp") ` OK this is the output
reading data from axial-glow-224522:accounts.ll_18201960 root |-- transactiondate: string (nullable = true) |-- transactiontype: string (nullable = true) |-- sortcode: string (nullable = true) |-- accountnumber: string (nullable = true) |-- transactiondescription: string (nullable = true) |-- debitamount: float (nullable = true) |-- creditamount: float (nullable = true) |-- balance: float (nullable = true)
The tmp view is created. However, when trying to read debitamount defined as float, I am getting the following error
spark.sql("select transactiondate,transactiontype, sortcode, accountnumber, transactiondescription, debitamount from tmp").collect.foreach(println)
18/12/27 19:41:59 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, rhes77-cluster-w-1.europe-west2-a.c.axial-glow-224522.internal, executor 1): java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Float at scala.runtime.BoxesRunTime.unboxToFloat(BoxesRunTime.java:109) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getFloat(rows.scala:43) at org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getFloat(rows.scala:195) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Any workaround on this if exists please!
Thanks,
Mich
Hi,
I now have a work-around for this issue using Spark DF transformation to cast date from String to Date and String to Double where appropriate and then save the data in BigQuery table.
Let me know your thoughts.
Thanks
Hi, I'm trying to analyze firebase data using Spark with this spark-Bigquery. But class cast exception has occurred like Double cannot be cast to Float. Additionally, Double type exists in the Avro specs, but it seems only Float type casting in the module. (https://avro.apache.org/docs/1.8.1/spec.html)
Would you mind tell me is this a bug?
https://support.google.com/firebase/answer/7029846
Error Detail
command
output
command
output