trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.83k stars 2.85k forks source link

Trino-cli has no option to accept targetResultSize query parameter #22303

Open anilsomisetty opened 3 weeks ago

anilsomisetty commented 3 weeks ago

In query result GET request from coordinator it has a query parameter targetResultSize which is per get request how much a coordinator can send to client. I see there is no way to configure it from trino client.

mosabua commented 1 week ago

What is the use case you are trying to solve with exposing this parameter? The protocol is already conversational so there is no real limit for the overall data returned.

anilsomisetty commented 1 week ago

Hi @mosabua

Currently in below mentioned code I see that for query get request to fetch resultset there is a queryparameter targetResultSize defined with a default size as 16MB i.e 16MB of result set would be sent to client from coordinator per get request.

Another thing to note here is this parameter is never set i.e it's always null because there is no way this parameter can be set from client. but I see a condition to check take min of this value and max target size which is 128MB if it is set.

If this parameter can be set through client the queries that would fetch large resultset from coordinator mininization of the get request calls can be done which would reduce the wait time and queries would complete faster.

ExecuteStatementResource.java(https://github.com/trinodb/trino/blob/1b0a0116d34e77a9fa8f5c47207ac62e3b16c87b/core/trino-main/src/main/java/io/trino/server/protocol/ExecutingStatementResource.java#L216)

please let me know your comments.

mosabua commented 1 week ago

This might be something we can do in our current work on improving client protocol performance with Project Swift.

https://github.com/trinodb/trino/issues/22271

Note that we don't know if increasing the size does indeed improve performance. Did you test this and can report any findings?

Also .. ideally users would not have to configure such things just to get a faster connection.

anilsomisetty commented 5 days ago

What I have observed is in code targetResultSize was set as QueryParam and was never getting used or set from trino client while running a query, so I was just curious on why it was not done.

I have run a query which returns approximately 6GB of data with 7.7Million rows on a trino cluster from DBVisualizer using trino-jdbc client driver jar, these are my findigs:

  1. When targetResultSize is default i.e 16MB it took 1hour to fetch this result
  2. When targetResultSize is made maximum i.e 128MB it took 4minutes to fetch this result
mosabua commented 4 days ago

Wow .. very interesting finding. We should take this into account for Project Swift @wendigo @dain @electrum @martint

wendigo commented 4 days ago

@mosabua I can't agree with these numbers - I've been benchmarking changing this value and there is a diminishing improvement for values over 32 MB. The buffering and compression happens on the coordinator so setting this value high, puts an additional pressure on the coordinator. That doesn't scale well.

anilsomisetty commented 4 days ago

My trino cluster has 6 nodes = 5 workers and 1 coordinator(doesn't act as worker) where each node has 1TB memory and 128 CPU where I have run the query.