pingcap / tispark

TiSpark is built for running Apache Spark on top of TiDB/TiKV
Apache License 2.0
883 stars 244 forks source link

Support localdate convert to date in datetype #2780

Open shiyuhang0 opened 6 months ago

shiyuhang0 commented 6 months ago

What problem does this PR solve?

spark v3.3.4 + tispark v3.2.3 + tidb v6.5.8 + java version “1.8.0_391”

MySQL [test]> desc tt1;
+-------+-------------+------+------+---------+-------+
| Field | Type        | Null | Key  | Default | Extra |
+-------+-------------+------+------+---------+-------+
| id    | bigint(20)  | NO   | PRI  | NULL    |       |
| dt1   | varchar(20) | YES  |      | NULL    |       |
+-------+-------------+------+------+---------+-------+
2 rows in set (0.00 sec)MySQL [test]> desc tt2;
+-------+------------+------+------+---------+-------+
| Field | Type       | Null | Key  | Default | Extra |
+-------+------------+------+------+---------+-------+
| id    | bigint(20) | NO   | PRI  | NULL    |       |
| dt1   | date       | YES  |      | NULL    |       |
+-------+------------+------+------+---------+-------+
2 rows in set (0.00 sec)
MySQL [test]> desc tt3;
+-------+------------+------+------+---------+-------+
| Field | Type       | Null | Key  | Default | Extra |
+-------+------------+------+------+---------+-------+
| id    | bigint(20) | NO   | PRI  | NULL    |       |
| dt1   | date       | YES  |      | NULL    |       |
+-------+------------+------+------+---------+-------+
2 rows in set (0.00 sec)
spark-sql>  insert into tt2 select id,dt1 from tt3;24/04/01 16:07:41 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.24/04/01 16:07:41 WARN SimpleFunctionRegistry: The function ti_version replaced a previously registered function.24/04/01 16:07:41 WARN SimpleFunctionRegistry: The function time_to_str replaced a previously registered function.24/04/01 16:07:41 WARN SimpleFunctionRegistry: The function str_to_time replaced a previously registered function.24/04/01 16:07:45 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 3) (10.2.103.129 executor 0): org.tikv.common.exception.TiDBConvertException: convert to tidb data error for column 'dt1' at com.pingcap.tispark.utils.WriteUtil$.$anonfun$sparkRow2TiKVRow$2(WriteUtil.scala:71) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at com.pingcap.tispark.utils.WriteUtil$.sparkRow2TiKVRow(WriteUtil.scala:58)    at com.pingcap.tispark.write.TiBatchWriteTable.$anonfun$preCalculate$1(TiBatchWriteTable.scala:127) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:514)   at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:268)  at scala.collection.Iterator.foreach(Iterator.scala:943)    at scala.collection.Iterator.foreach$(Iterator.scala:943)   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)   at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)  at scala.collection.AbstractIterator.to(Iterator.scala:1431)    at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)  at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)  at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)   at org.apache.spark.rdd.RDD.$anonfun$take$2(RDD.scala:1470) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2278) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)   at org.apache.spark.scheduler.Task.run(Task.scala:136)  at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  at java.lang.Thread.run(Thread.java:750)Caused by: org.tikv.common.exception.ConvertNotSupportException: do not support converting from java.time.LocalDate to  com.pingcap.tikv.types.DateType at com.pingcap.tikv.types.DateType.convertToMysqlDate(DateType.java:72) at com.pingcap.tikv.types.DateType.doConvertToTiDBType(DateType.java:58)    at com.pingcap.tikv.types.DataType.convertToTiDBType(DataType.java:399) at com.pingcap.tispark.utils.WriteUtil$.$anonfun$sparkRow2TiKVRow$2(WriteUtil.scala:64) ... 32 more
24/04/01 16:07:46 ERROR TaskSetManager: Task 0 in stage 4.0 failed 4 times; aborting joborg.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 6) (10.2.103.129 executor 0): org.tikv.common.exception.TiDBConvertException: convert to tidb data error for column 'dt1'   at com.pingcap.tispark.utils.WriteUtil$.$anonfun$sparkRow2TiKVRow$2(WriteUtil.scala:71) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) at com.pingcap.tispark.utils.WriteUtil$.sparkRow2TiKVRow(WriteUtil.scala:58)    at com.pingcap.tispark.write.TiBatchWriteTable.$anonfun$preCalculate$1(TiBatchWriteTable.scala:127) at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)  at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:514)   at scala.collection.Iterator$SliceIterator.hasNext(Iterator.scala:268)  at scala.collection.Iterator.foreach(Iterator.scala:943)    at scala.collection.Iterator.foreach$(Iterator.scala:943)   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)   at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)   at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)  at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49) at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)   at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)  at scala.collection.AbstractIterator.to(Iterator.scala:1431)    at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358) at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)  at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)  at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339) at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)   at org.apache.spark.rdd.RDD.$anonfun$take$2(RDD.scala:1470) at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2278) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)   at org.apache.spark.scheduler.Task.run(Task.scala:136)  at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)  at java.lang.Thread.run(Thread.java:750)Caused by: org.tikv.common.exception.ConvertNotSupportException: do not support converting from java.time.LocalDate to  com.pingcap.tikv.types.DateType at com.pingcap.tikv.types.DateType.convertToMysqlDate(DateType.java:72) at com.pingcap.tikv.types.DateType.doConvertToTiDBType(DateType.java:58)    at com.pingcap.tikv.types.DataType.convertToTiDBType(DataType.java:399) at com.pingcap.tispark.utils.WriteUtil$.$anonfun$sparkRow2TiKVRow$2(WriteUtil.scala:64)

What is changed and how it works?

Check List

Tests

Code changes

Side effects

Related changes

ti-chi-bot[bot] commented 6 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please ask for approval from shiyuhang0, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/pingcap/tispark/blob/master/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
shiyuhang0 commented 6 months ago

/run-all-tests tidb=release-6.1 tikv=release-6.1 pd=release-6.1

shiyuhang0 commented 4 months ago

/run-all-tests tidb=release-6.1 tikv=release-6.1 pd=release-6.1

shiyuhang0 commented 1 week ago

/run-all-tests tidb=release-6.1 tikv=release-6.1 pd=release-6.1