Open RosterIn opened 4 years ago
This is needed and useful for many use cases (one of them being efficient RECURSIVE queries).
We could start simple and just allow creating table that will be dropped when the session ends (no hiding rules, etc, that can come later). That would not limit what catalog a table resides in, etc.
I'm happy to give it a shot - if somebody points to the best place to put a session-end-hook. I looked around a bit, and where to put such a hook was not obvious to me.
We also need this feature very much.
Would second both @RosterIn and @lhofhansl Temp tables would help with a lot of common operations done in ETL tasks
Hi @findepi, do we have any roadmap about this feature? We also need temp tables and temp views in our use cases. Thanks.
@junyi1313 i don't think anyone currently is working on this. cc @arhimondr
In-memory temp tables are game-changers in query optimization, Is there any plan to implement this functionality?
@schilloji While Trino doesn't offer temp tables functionality it does offer memory connector (https://trino.io/docs/current/connector/memory.html) that allows creating in-memory tables. Apart from being automatically droppped at the end of the session does temporary table differ from a regular table in any other way?
@arhimondr temp table goes to disk once the memory limit is reached (can be configured), plus you can use that specific temp table many times in the session (there are two modes of temp table one which ends when the connection ends and the other when a commit is issued), other reason to use temp table is that it may hold calculated data or some kind of logic (even joins) which you don't want to repeat every time you need to access the data
I agree with others that this one is a game changer I wonder if there is any chance this request will get some priority?
@galithaham
Sorry, i think I must've missed your message.
temp table goes to disk once the memory limit is reached (can be configured)
Currently Trino is designed to be a processing engine, it doesn't have capabilities to store data. Temp table will likely have to be delegated to a connector. There is a memory connector implementation, but at this point it is not production ready and doesn't have proper memory accounting implemented.
plus you can use that specific temp table many times in the session (there are two modes of temp table one which ends when the connection ends and the other when a commit is issued)
In Trino there's no concept of a stateful session as of today. However there are transactions.
other reason to use temp table is that it may hold calculated data or some kind of logic (even joins) which you don't want to repeat every time you need to access the data
Yes, that makes perfect sense.
@RosterIn
I agree with others that this one is a game changer I wonder if there is any chance this request will get some priority?
Today you can create a table as part of a transaction and issue a DROP TABLE
statement before transaction commit. I'm trying to better understand the use case and what is missing.
From what I understand it is about having a CREATE TEMPORARY TABLE
statement with the semantics of automatically dropping a temporary table upon transaction commit (to avoid calling DROP TABLE
explicitly). It shouldn't be difficult to add this functionality if needed but given that it is seen as a game changer I guess there's more to it.
I would assume that the second part that is missing is an efficient storage for temporary tables to avoid costly re-encoding and potentially avoiding touching a distributed file system if possible. Is my understanding correct?
Adding support for temp tables is on the backlog and we'll probably have someone start to work on it in the next few months.
@arhimondr, we need temp tables for implementing recursive WITH queries efficiently (among other things). See https://github.com/trinodb/trino/issues/1122#issuecomment-632402097
The cool thing about temporary tables in Postgres is that they are bound to the current session. That is, no concurrent session is able to interfer with your session's temporary table. Of course it is possible to do the same thing with transactions (start transaction, clear existing temp table, do your thing with the temp table, commit/rollback) to have (sort of) session-specific temp table content. But with the Hive connector and S3 as storage this seems not to be trivial (or I'm doing it wrong).
Long story short: session-bound temp tables would be really great.
From what I understand it is about having a CREATE TEMPORARY TABLE statement with the semantics of automatically dropping a temporary table upon transaction commit (to avoid calling DROP TABLE explicitly).
This is exactly what I'm after.. not needing to add drop statement myself. Let the engine to take care of it.. just like in programing when you create temp file or temp directory.. the context manager knows to drop the object when session ends
Just wanted to chime in and be another voice requesting this feature. In my day I use Teradata and I create VOLATILE tables pretty regularly. Sometimes I need to switch from Teradata to Trino in order to join tables from different databases. Would be really helpful to use a VOLATILE table feature in Trino.
Hi guys, trino itself has just asked me to create a temporary table to fix a warning, so I came here hoping that the feature had been added, but it turns out it had not. What am I to make of this warning, then?
Number of stages in the query (126) exceeds the soft limit (100). If the query contains multiple aggregates with DISTINCT over different columns, please set the 'use_mark_distinct' session property to false. If the query contains WITH clauses that are referenced more than once, please create temporary table(s) for the queries in those clauses.
Do we have any estimated timeline when we may expect this feature? Would be greatly appreciated.
Many DBs allow to create Temp tables
Postgresql: https://www.postgresql.org/docs/13/sql-createtable.html MySQL: https://dev.mysql.com/doc/refman/8.0/en/create-temporary-table.html BigQuery: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#temporary_tables and others..
Currently presto doesn't allow it. There was a similar issue in PrestoDB which I'm not sure why it's closed https://github.com/prestodb/presto/issues/3499
The motivation here is other than allow additional useful functionality is also to allow tools like https://github.com/great-expectations/great_expectations/issues/1917 to be configured with Presto. See the comment in the thread:
Great Expectations is not officially supporting Presto yet. When you are validating a result set of a query, Great Expectations saves the result set into a temporary table for performance reasons.