TiSpark is a thin layer built for running Apache Spark on top of TiDB/TiKV/TiFlash to answer complex OLAP queries. It enjoys the merits of both the Spark platform and the distributed clusters of TiKV/TiFlash while seamlessly integrated to TiDB.
The figure below show the architecture of TiSpark.
TiSpark relies on the availability of TiKV clusters and PDs. You also need to set up and use the Spark clustering platform.
Most of the TiSpark logic is inside a thin layer, namely, the tikv-client library.
We will not provide the mysql-connector-java
dependency because of the limit of the GPL license.
The following versions of TiSpark's jar will no longer include mysql-connector-java
.
Now, TiSpark needs mysql-connector-java
for writing and auth. Please import mysql-connector-java
manually when you need to write or auth.
you can import it by putting the jar into spark jars file
you can also import it when you submit spark job like
spark-submit --jars tispark-assembly-3.0_2.12-3.1.0-SNAPSHOT.jar,mysql-connector-java-8.0.29.jar
Feature Support | TiSpark 2.4.x | TiSpark 2.5.x | TiSpark 3.0.x | TiSpark master |
---|---|---|---|---|
SQL select without tidb_catalog | ✔ | ✔ | ||
SQL select with tidb_catalog | ✔ | ✔ | ✔ | |
SQL delete from with tidb_catalog | ✔ | ✔ | ||
DataFrame append | ✔ | ✔ | ✔ | ✔ |
DataFrame reads | ✔ | ✔ | ✔ | ✔ |
see here for more detail.
TiDB starts to support view
since tidb-3.0
. TiSpark currently does not support view
. Users are not be able to observe or access data through view
with TiSpark.
Spark config spark.sql.runSQLOnFiles
should not be set to false
, or you may got Error in query: Table or view not found
error.
Using the style of "{db}.{table}.{colname}" in the condition is not supported, e.g. select * from t where db.t.col1 = 1
.
Null in aggregration
is not supported, e.g. select sum(null) from t group by col1
.
The dependency tispark-assembly
should not be packaged into JAR of JARS
file (for example, build with spring-boot-maven-plugin), or you will get ClassNotFoundException
. You can solve it by adding spark-wrapper-spark-version
in your dependency or constructing another forms of jar file.
TiSpark doesn't support GBK character set.
TiSpark doesn't support the whole collations rule. Currently, TiSpark only supports the following collations: utf8_bin, utf8_general_ci, utf8_unicode_ci, utf8mb4_bin, utf8mb4_general_ci and utf8mb4_unicode_ci.
If spark.sql.ansi.enabled
is false an overflow of sum(bigint) will not cause an error but “wrap” the result, or you can cast bigint to decimal to avoid the overflow.
TiSpark supports retrieving data from table with Expression Index
, but the Expression Index
will not be used by the planner of TiSpark.
For English users, go to TiDB internals.
For Chinese users, go to AskTUG.
TiSpark is under the Apache 2.0 license. See the LICENSE file for details.