trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.86k stars 2.85k forks source link

add support for temp tables #5686

Open RosterIn opened 3 years ago

RosterIn commented 3 years ago

Many DBs allow to create Temp tables

Postgresql: https://www.postgresql.org/docs/13/sql-createtable.html MySQL: https://dev.mysql.com/doc/refman/8.0/en/create-temporary-table.html BigQuery: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#temporary_tables and others..

Currently presto doesn't allow it. There was a similar issue in PrestoDB which I'm not sure why it's closed https://github.com/prestodb/presto/issues/3499

The motivation here is other than allow additional useful functionality is also to allow tools like https://github.com/great-expectations/great_expectations/issues/1917 to be configured with Presto. See the comment in the thread: Great Expectations is not officially supporting Presto yet. When you are validating a result set of a query, Great Expectations saves the result set into a temporary table for performance reasons.

lhofhansl commented 3 years ago

This is needed and useful for many use cases (one of them being efficient RECURSIVE queries).

We could start simple and just allow creating table that will be dropped when the session ends (no hiding rules, etc, that can come later). That would not limit what catalog a table resides in, etc.

I'm happy to give it a shot - if somebody points to the best place to put a session-end-hook. I looked around a bit, and where to put such a hook was not obvious to me.

realknorke commented 3 years ago

We also need this feature very much.

SiddhaarthS commented 2 years ago

Would second both @RosterIn and @lhofhansl Temp tables would help with a lot of common operations done in ETL tasks

junyi1313 commented 2 years ago

Hi @findepi, do we have any roadmap about this feature? We also need temp tables and temp views in our use cases. Thanks.

findepi commented 2 years ago

@junyi1313 i don't think anyone currently is working on this. cc @arhimondr

schilloji commented 2 years ago

In-memory temp tables are game-changers in query optimization, Is there any plan to implement this functionality?

arhimondr commented 2 years ago

@schilloji While Trino doesn't offer temp tables functionality it does offer memory connector (https://trino.io/docs/current/connector/memory.html) that allows creating in-memory tables. Apart from being automatically droppped at the end of the session does temporary table differ from a regular table in any other way?

galithaham commented 2 years ago

@arhimondr temp table goes to disk once the memory limit is reached (can be configured), plus you can use that specific temp table many times in the session (there are two modes of temp table one which ends when the connection ends and the other when a commit is issued), other reason to use temp table is that it may hold calculated data or some kind of logic (even joins) which you don't want to repeat every time you need to access the data

RosterIn commented 1 year ago

I agree with others that this one is a game changer I wonder if there is any chance this request will get some priority?

arhimondr commented 1 year ago

@galithaham

Sorry, i think I must've missed your message.

temp table goes to disk once the memory limit is reached (can be configured)

Currently Trino is designed to be a processing engine, it doesn't have capabilities to store data. Temp table will likely have to be delegated to a connector. There is a memory connector implementation, but at this point it is not production ready and doesn't have proper memory accounting implemented.

plus you can use that specific temp table many times in the session (there are two modes of temp table one which ends when the connection ends and the other when a commit is issued)

In Trino there's no concept of a stateful session as of today. However there are transactions.

other reason to use temp table is that it may hold calculated data or some kind of logic (even joins) which you don't want to repeat every time you need to access the data

Yes, that makes perfect sense.

@RosterIn

I agree with others that this one is a game changer I wonder if there is any chance this request will get some priority?

Today you can create a table as part of a transaction and issue a DROP TABLE statement before transaction commit. I'm trying to better understand the use case and what is missing.

From what I understand it is about having a CREATE TEMPORARY TABLE statement with the semantics of automatically dropping a temporary table upon transaction commit (to avoid calling DROP TABLE explicitly). It shouldn't be difficult to add this functionality if needed but given that it is seen as a game changer I guess there's more to it.

I would assume that the second part that is missing is an efficient storage for temporary tables to avoid costly re-encoding and potentially avoiding touching a distributed file system if possible. Is my understanding correct?

martint commented 1 year ago

Adding support for temp tables is on the backlog and we'll probably have someone start to work on it in the next few months.

@arhimondr, we need temp tables for implementing recursive WITH queries efficiently (among other things). See https://github.com/trinodb/trino/issues/1122#issuecomment-632402097

realknorke commented 1 year ago

The cool thing about temporary tables in Postgres is that they are bound to the current session. That is, no concurrent session is able to interfer with your session's temporary table. Of course it is possible to do the same thing with transactions (start transaction, clear existing temp table, do your thing with the temp table, commit/rollback) to have (sort of) session-specific temp table content. But with the Hive connector and S3 as storage this seems not to be trivial (or I'm doing it wrong).

Long story short: session-bound temp tables would be really great.

RosterIn commented 1 year ago

From what I understand it is about having a CREATE TEMPORARY TABLE statement with the semantics of automatically dropping a temporary table upon transaction commit (to avoid calling DROP TABLE explicitly).

This is exactly what I'm after.. not needing to add drop statement myself. Let the engine to take care of it.. just like in programing when you create temp file or temp directory.. the context manager knows to drop the object when session ends

LarryLoveIV commented 1 year ago

Just wanted to chime in and be another voice requesting this feature. In my day I use Teradata and I create VOLATILE tables pretty regularly. Sometimes I need to switch from Teradata to Trino in order to join tables from different databases. Would be really helpful to use a VOLATILE table feature in Trino.

RaynorThere commented 6 months ago

Hi guys, trino itself has just asked me to create a temporary table to fix a warning, so I came here hoping that the feature had been added, but it turns out it had not. What am I to make of this warning, then?

Number of stages in the query (126) exceeds the soft limit (100). If the query contains multiple aggregates with DISTINCT over different columns, please set the 'use_mark_distinct' session property to false. If the query contains WITH clauses that are referenced more than once, please create temporary table(s) for the queries in those clauses.

Do we have any estimated timeline when we may expect this feature? Would be greatly appreciated.