trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.88k stars 2.86k forks source link

Add generic jdbc datasource connector #2910

Open tooptoop4 opened 4 years ago

tooptoop4 commented 4 years ago

I'm not talking about client->jdbc driver->presto server. I'm talking about presto server connector to any jdbc source (ie sybase, oracle.etc).

List of columns can be derived by running where 1=0 query like spark does: https://github.com/apache/spark/commit/64c29afcb787d9f176a197c25314295108ba0471

https://docs.starburstdata.com/latest/connector/generic-jdbc.html

https://www.simba.com/drivers/ , https://www.dbvis.com/features/database-drivers/ , https://en.wikipedia.org/wiki/List_of_relational_database_management_systems show a list of many different data sources this would open up connectivity to.

some examples:

Unidata: com.ibm.u2.jdbc.UniJDBCDriver kdb: https://github.com/bjconlan/kdb-jdbc MS Access: sun.jdbc.odbc.JdbcOdbcDriver SQLLite: org.sqlite.JDBC SAS: com.sas.net.sharenet.ShareNetDriver SAP: com.sap.db.jdbc.Driver Greenplum: com.pivotal.jdbc.GreenplumDriver DynamoDB: com.simba.dynamodb.jdbc.Driver Netezza: org.netezza.Driver Vertica: com.vertica.jdbc.Driver Clickhouse: ru.yandex.clickhouse.ClickHouseDriver Sybase: com.sybase.jdbc4.jdbc.SybDriver Impala: com.cloudera.impala.jdbc41.Driver SparkSQL ThriftServer: com.simba.spark.jdbc41.Driver H2: org.h2.Driver Derby: org.apache.derby.jdbc.ClientDriver HiveServer2: org.apache.hive.jdbc.HiveDriver Drill: org.apache.drill.jdbc.Driver Teradata: com.ncr.teradata.TeraDriver Snowflake: net.snowflake.client.jdbc.SnowflakeDriver DB2: com.ibm.db2.jcc.DB2Driver Salesforce: com.ddtek.jdbc.sforce.SForceDriver Azure CosmosDB: cdata.jdbc.cosmosdb.CosmosDBDriver Athena: com.simba.athena.jdbc.Driver Aurora: org.mariadb.jdbc.Driver Informix: com.informix.jdbc.IfxDriver Firebird: org.firebirdsql.jdbc.FBDriver ...and many more

Cons:

  1. Some tables may not be listed in 'show tables' due to different getSchemas() implementations but direct query on the table still working ie https://issues.apache.org/jira/browse/SPARK-18524
  2. Performance will be worse than using native connector, potentially some detractors might claim 'aww presto is slow with x source' but slow is better than nothing right?
  3. Not all data types can be natively mapped, but at least can be cast to VARCHAR. Starburst generic jdbc connector might have a bug as oracle columns of NUMBER were not visible at all in presto.
  4. Some connections may not work at all, but then again they don't now, so nothing to lose

Pros:

  1. Can POC connectivity to legacy sources - ANY source with JDBC interface (point 4 from cons aside)
  2. Can retrieve small lookup datasets from legacy sources for initial stages of data migration
  3. Can use presto as a reconciliation engine to compare data across DBMS
  4. Native connectors for paid (ie non open source) JDBC drivers will never make it into Presto open source release, but users that have paid for those drivers will be able to access those data sources within presto and do joins to other DBMS using presto
  5. Aligns with the title of the white paper 'SQL on Everything' - https://prestosql.io/paper.html.
  6. This feature would attract a lot of users to presto community, further snowballing adoption, they would then contribute other improvements.etc
dprophet commented 4 years ago

OH, very very interesting. I might be willing to take this one on. It would help so many people, myself included.

dprophet commented 4 years ago

Anyone have doc's for a skeleton connector so I can fill in the blanks? I assume there is a test harness for local development (thus not having to develop within the Presto server itself). Unit testing framework? Maybe containerized tests for each connector?

dprophet commented 4 years ago

Would you recommend adding test for

https://github.com/bloomberg/comdb2

To this JDBC connector?

tooptoop4 commented 4 years ago

@dprophet did it work? pls write one :-)

flypenguin commented 3 years ago

what happened here? I am looking for an informix connector, and it seems that would be the solution ... any updates?

tooptoop4 commented 3 years ago

@flypenguin see the linked PR (also use the 4 updated .java files in the root of https://github.com/tooptoop4/presto-1/tree/lightgenjdbc for 348-SNAPSHOT with fixes). you can build from source and use it.

it has been successfully tested against: sqlite, sybase, oracle, impala, presto, dremio, sparksqlthrift, hiveserver2, postgres, mysql, db2, sqlserver, cockroachdb, derby, h2, hsqldb (ie hypersql), firebird

some recommended settings in .properties: case-insensitive-name-matching=true unsupported-type-handling=CONVERT_TO_VARCHAR jdbc-types-mapped-to-varchar=timestamp

ebyhr commented 3 years ago

@flypenguin Implementing JDBC based connector is easy comparing to other connectors. Please feel free to #dev channel in the community Slack. https://trino.io/slack.html

findepi commented 3 years ago

@flypenguin adding to @ebyhr 's comment -- you can see ongoing PR for inspiration, like https://github.com/trinodb/trino/pull/6134 or https://github.com/trinodb/trino/pull/4909

reisepass commented 2 years ago

Looks like starburst build this. It would be nice if they could commit something to upstream trino https://docs.starburst.io/latest/connector/starburst-generic-jdbc.html

gesundes commented 1 year ago

JDBC connector can provide the ability to connect to Hive via Knox. So, any progress here, guys?