Open hashhar opened 4 years ago
inputDataSize
is still useful as it is base for CBO decisions.
Hi @hashhar I started working on this issue. Just to be sure, the queries highlighted in the screenshot below are the ones we want to show via the CLI too ?
I am currently reviewing the frontend - ReactJS files likecore/trino-main/src/main/resources/webapp/src/components/QueryDetail.jsx
to figure out where the property data is coming from. My guess is, the web-ui is mapped to some kind of an API endpoint to show these properties.
Do you know what would be the source of data for the CLI ? I am yet to explore the trino/client/
dir.
when you have a minute or so, would you be able share the Java file names I should start with to approach this issue ? I am trying to understand the workflow of how properties make it to the CLI at first place. So, any help to get me started on this issue quickly will be appreciated!
Thanks!
@Damans227 Thanks for picking this up.
To get started you should take a look at StatusPrinter
class in Trino CLI. That class accesses data from a few "models" like StatementStats
, QueryStats
and StageStats
(from io.trino.client
).
You can see who is responsible for adding data to these models by looking at the callers of their constructors (which are limited in number so shouldn't be too hard).
For the JDBC driver there are matching classes (StatementStats
, QueryStats
and StageStats
) in io.trino.jdbc
which are exposed to users of the JDBC client as methods. So for JDBC it should be good enough to ensure that information from the io.trino.client
version of these classes ends up inside the io.trino.jdbc
version of these classes too.
Feel free to ask more if you have questions either here on the Slack at #dev.
@Damans227 Thanks for picking this up.
To get started you should take a look at
StatusPrinter
class in Trino CLI. That class accesses data from a few "models" likeStatementStats
,QueryStats
andStageStats
(fromio.trino.client
).You can see who is responsible for adding data to these models by looking at the callers of their constructors (which are limited in number so shouldn't be too hard).
For the JDBC driver there are matching classes (
StatementStats
,QueryStats
andStageStats
) inio.trino.jdbc
which are exposed to users of the JDBC client as methods. So for JDBC it should be good enough to ensure that information from theio.trino.client
version of these classes ends up inside theio.trino.jdbc
version of these classes too.Feel free to ask more if you have questions either here on the Slack at #dev.
This is very helpful. Thanks as always!
@hashhar Hi! I tried wrapping my head around the StatusPrinter
class today. It seems like this class prints out the query info at 2 different stages of the query execution i.e., FINISHED
, and RUNNING
. Another key finding was that the detailed query information like the example given below, is only printed when --debug
flag is set on the trino command:
Query 20211103_011509_00016_k8hi7, FINISHED, 1 node
http://localhost:8080/ui/query.html?20211103_011509_00016_k8hi7
Splits: 20 total, 20 done (100.00%)
CPU Time: 0.0s total, 6.25K rows/s, 0B/s, 80% active
Per Node: 0.0 parallelism, 101 rows/s, 0B/s
Parallelism: 0.0
Peak Memory: 0B
0.25 [25 rows, 0B] [101 rows/s, 0B/s]
So, I guess, before I dive deeper, it will be helpful to know when do we want to expose the physicalInputDataSize
property ? Is it during the FINISH
state or RUNNING
state ? Also, do we want to show it only when --debug
is set ?
Thanks!
When I created this issue the intent was to print it alongside the processedBytes
. i.e. whenever processedBytes
are shown physicalInputDataSize
should also be shown. IIRC they get printed both with and without debug, during progress and at the end of query too.
I believe you can search for processedBytes
in the StatusPrinter
class to find all places where it gets printed.
cc: @electrum In the CLI should we:
@Damans227 I think it'd be smarter to start with the JDBC driver change since there are no such decisions to be made there. Sorry for not anticipating it earlier.
The WebUI shows both
physicalInputDataSize
andinputDataSize
today.This issue is to add similar functionality to the Presto CLI (and the JDBC QueryStats/StageStats classes too).
~As part of this change we can also explore if
inputDataSize
makes sense to expose to clients due to it's limited usefulness.~cc: @sopel39 @electrum