pingcap / tispark

TiSpark is built for running Apache Spark on top of TiDB/TiKV
Apache License 2.0
883 stars 244 forks source link
bigdata spark tidb tikv

TiSpark

Maven Central License

TiSpark is a thin layer built for running Apache Spark on top of TiDB/TiKV/TiFlash to answer complex OLAP queries. It enjoys the merits of both the Spark platform and the distributed clusters of TiKV/TiFlash while seamlessly integrated to TiDB.

The figure below show the architecture of TiSpark.

architecture

TiSpark relies on the availability of TiKV clusters and PDs. You also need to set up and use the Spark clustering platform.

Most of the TiSpark logic is inside a thin layer, namely, the tikv-client library.

Doc TOC

About mysql-connector-java

We will not provide the mysql-connector-java dependency because of the limit of the GPL license.

The following versions of TiSpark's jar will no longer include mysql-connector-java.

Now, TiSpark needs mysql-connector-java for writing and auth. Please import mysql-connector-java manually when you need to write or auth.

Feature Support

Feature Support TiSpark 2.4.x TiSpark 2.5.x TiSpark 3.0.x TiSpark master
SQL select without tidb_catalog
SQL select with tidb_catalog
SQL delete from with tidb_catalog
DataFrame append
DataFrame reads

see here for more detail.

Limitations

Follow us

Twitter

@PingCAP

Forums

For English users, go to TiDB internals.

For Chinese users, go to AskTUG.

License

TiSpark is under the Apache 2.0 license. See the LICENSE file for details.