prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
16.08k stars 5.39k forks source link

Implement Jdbc join pushdown capabilities in presto #23152

Open Ajas-Mangal opened 4 months ago

Ajas-Mangal commented 4 months ago

Expected Behavior or Use Case

When presto identifies a join query which is specific to a Jdbc remote datasource, it split the join query into multiple select query based on the number of tables involved in the join query and select all records from each tables using the jdbc connector without passing the join condition.

If we "Push down" or send these joins which involves same catalog/remote datasource as part of the SQL to the remote data source it increase the performance 3x to 10x.

Presto Component, Service, or Connector

Jdbc Connector and spi module which create presto PlanNode

Possible Implementation

Begin to look at Presto analyze component that breaks down a multi join query into multiple subqueries and do the following actions

Example Screenshots (if appropriate):

Context

This git issue is to address a performance limitation of Presto federation of SQLs of JDBC connector to remote data sources such as DB2, Postgres, Oracle etc.

What is the limitation: For a given query that might involve different remote data sources, the query is divided into sub-queries that can be sent to the respective data source, results of these sub-queries are fetched into the presto workers, additional operations such as filters, joins, sorts are applied before sending back to the user.

Operations such as filters, joins involve tables from the same remote data source, it is best to "Push down" or send these joins as part of the SQL query to the remote data source and it can be 10x to 100x faster than fetch all data of the table into presto workers and then apply these operators.

zhangbutao commented 3 months ago

I think this issue is similar to https://github.com/prestodb/presto/pull/16583

Thanzeel-Hassan-IBM commented 2 weeks ago

An RFC is raised for this : https://github.com/prestodb/rfcs/pull/32