trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.14k stars 2.92k forks source link

Ability to DESCRIBE any arbitrary query similar to SHOW STATS #13473

Open hashhar opened 2 years ago

hashhar commented 2 years ago

It would be useful to be able to execute DESCRIBE over any arbitrary query instead of being limited to named tables.

Perhaps a syntax like DESCRIBE (....), similar to what we do with SHOW STATS.

Originally posted by @findepi in https://github.com/trinodb/trino/issues/13465#issuecomment-1202812030

findepi commented 2 years ago

Originally posted by @findepi in #13465 (comment)

To be clear: i don't know how useful it would be. I don't have particular use-cases in mind.

For programmatic use-cases, the functionality is already available via PREPARE https://trino.io/docs/current/sql/prepare.html + DESCRIBE https://trino.io/docs/current/sql/describe-output.html :

trino:public> PREPARE p1 FROM SELECT * FROM TABLE(postgresql.system.query('SELECT * FROM nation'));
PREPARE

trino:public> DESCRIBE OUTPUT p1;
 Column Name | Catalog | Schema | Table |     Type     | Type Size | Aliased
-------------+---------+--------+-------+--------------+-----------+---------
 nationkey   |         |        |       | bigint       |         8 | false
 name        |         |        |       | varchar(25)  |         0 | false
 regionkey   |         |        |       | bigint       |         8 | false
 comment     |         |        |       | varchar(152) |         0 | false
(4 rows)

Obviously, doing PREPARE + DESCRIBE + DEALLOCATE isn't nice for a interactive use-cases, but interactively people may just fire up the query, right?

hashhar commented 2 years ago

DESCRIBE on arbitrary query is a nice alternative to what people generally do today when exploring some schema and writing larger queries - adding a LIMIT 1 to every SELECT just to be able to see the schema of the data that is returned. The LIMIT 1 still does IO and performs JOINs for example so is slow.

And I don't think users (other than devs) know about PREPARE + DESCRIBE OUTPUT + DEALLOCATE trick.

This would be something nice to have but isn't strictly necessary.

findepi commented 2 years ago

The LIMIT 1 still does IO and performs JOINs for example so is slow.

You can add LIMIT 0. The optimizer will obviously eliminate the whole query, so the execution will be cheap.

And I don't think users (other than devs) know about PREPARE + DESCRIBE OUTPUT + DEALLOCATE trick.

Agreed.

DESCRIBE on arbitrary query is a nice alternative to what people generally do today when exploring some schema and writing larger queries

I am not saying we shouldn't add this. We should have some support for this, if this is a common thing people do with SQL queries. Since it would be meant for interactive consumption, we also need to think about the syntax. For example DESCRIBE (...) -- this requires a human to add something both at the start and and the end of the query. Adding LIMIT 0 at the end will still be faster to type. Maybe we don't need the braces (like in EXPLAIN)?

Also, will we ever want DESCRIBE <query> to mean something else that just listing output columns of the query?