knwg145 commented 5 years ago

We are currently trying to create a more extensive warning generator to warn our Presto users and give them feedback.

I have separated warnings generation into 3 possible categories:  

Warnings generated in the connectors 
Warnings about deprecated tables, deprecated views, deprecated udfs, deprecated session properties etc.
Warnings about query execution statistics approaching resource limits

Warnings for deprecated tables, deprecated views, etc or other problems related to them can be very useful to let users know about ongoing issues in production. In order to enable such warnings in real-time, we are working on injecting and refreshing configurations for such warnings through an external database (rather than storing them in a variable or a file, which requires deployment). In this case, we have an external MySQL database that stores such information and warnings will be generated dynamically based on the values in MySQL.

For warnings related to query execution statistics, it is helpful for us to alert users when their queries’ resource usage is above a certain threshold. In this case, we are generating warnings if the memory/cpu usage of a query is above a certain percentage of the maximum amount possible. This is useful for cases like scheduled queries operating on an ever increasing dataset. These warnings will allow the user to know if there might be problems with a query in the future and to tune it accordingly. Another case where these query statistics warnings are useful is for incremental query building. Users like to add on top of existing queries and generating warnings will warn them to be careful about building more on top of the existing query. In addition, this can encourage users to try and modify their queries to be more efficient (one case that comes to mind is using too many CTEs).  

For the hive connector, we have a couple of warnings that we want to generate. These are:

Generating warnings for tables not using ORC
Generating warnings if too many partitions for a table is used
Generating warnings for unsupported column types

We want to generate these warnings in the hive connector to keep it modular and extendable.

For the connector warnings, I was thinking about having a WarningCollector for each connector within the WarningCollector. Maybe some kind of mapping of

  CatalogName : ConnectorWarningCollector  

This connector specific warning collector can be passed in through the function signature whenever there is an engine to connector interaction. Warnings generated in a connector can be added into this specific warning collector and the information will be available to the presto engine. This will allow the presto engine to fetch warnings already generated from any connectors at any time.  

There is a similar effort going on here: #1006

oneonestar commented 5 years ago

Thanks for the thought. Since many people are interested in enhancing the warning system, we could turn this into a roadmap item. Let's try to work out what the warning system should look like:

Enhanced Warning System

Objective

Detect bad queries
Collect and classify warnings into categories / levels for better management
React to warnings by take proper action(s)

Bad queries (ie. using deprecated UDFs or consuming too much resources) should generate warning(s) to notify system admins or users to take further actions. Depends on the importance of the warning, it should be classify into different categories and levels. System should react to the warnings by taking proper action, such as kill the query, notify external systems, etc.

How warnings are being generated

From Presto Engine:

Planning: Before / During / After Optimization
During execution
Query completed

From connectors:

During planning
During execution

During query optimization, some information may be added / lost due to the transformations. Runtime statistics could trigger more warnings. For example, the splits information are loaded async, the total amount of data to be scanned are unknown during the planning time. After query completed, the system will have all execution statistics which could trigger even more warnings.

Although the best way is to prevent the harmful queries being executed at all, it is not possible to detect all possible issues during planning time. Detecting issues in runtime and after completion can give us a more complete picture.

What kind of warnings can be generated

Performance warning
- Scanning too much partitions / data
- MySQL JDBC connector should be used to load too much data
- Using text format instead of ORC format
Deprecated warning
- Warnings about deprecated tables, deprecated views, deprecated UDFs, deprecated session properties etc
Compatibility warning
- Incompatible type casing
- Unsupported column types
(Others)

IMO, warnings generated by connectors should be classified by their nature. Connector information should be provided as additional information instead of acting as a category.

Configuration management

What to config

Enable / Disable of each warning rules
Parameters for each warning rules (warning threshold...)
What actions to take to react to warnings

How to config

File based
From External systems such as MySQL

I'm still thinking what should be the best way to config the warning system. We need to figure out how to preserve the modularity of the connectors and provide a common configuration system to pass the config to the sub-warningCollector in each connector.

How to response to warnings

Add a QueryWarningEvent to event-listener interface
Kill the query
Change the query to different resource group
Show it in CLI / WebUI (Done)
Show in QueryCompletedEvent (Done)

raghavsethi commented 5 years ago

I'd spent some time thinking about this when we originally designed warnings and diverged from this in a few ways:

The different classes of warnings would be: Performance, Correctness, and Deprecation. I would caution against compatibility as a type, since that should fall into one of the the three I outlined. I do want to be clear that Presto will not under any circumstances produce incorrect results, the correctness category should warn users about semantic errors in their queries that may produce unexpected results.
I don't think we need to think about dynamic configuration just yet, given the complexity of doing it right, and my hunch is that existing connector.properties file might be the right place to configure the first set of warnings.
For connector-produced warnings, my guess is that the engine could simply report them all and does not need to care about thresholds etc. The thresholds could be read and enforced by the connector if required.
Responding to warnings is a bit of a minefield. I would recommend delegating that to the client to respond, with the exception of failing the query for certain warnings or warning classes. I can see why query warning events would be useful, but spammy warnings could take down the external systems, we should think about what the dedup/noise reduction strategy is there.
Instead of warning rules, we could enable and disable reporting of warning codes or warning types.

trinodb / trino

Create a more extensive query warnings generator #1140