lyft / presto-gateway

A load balancer / proxy / gateway for prestodb
Apache License 2.0
358 stars 156 forks source link

Random "Error opening rules configuration file" when using Routing Rule Engine #166

Open rdarwish741 opened 2 years ago

rdarwish741 commented 2 years ago

I'm getting random "Error opening rules configuration file" when using Routing Rule Engine. It only happens when I have 2 or more clients sending requests to presto-gateway.

ERROR [2022-04-04 21:03:57,026] com.lyft.data.gateway.ha.router.RoutingGroupSelector$Logger: Error opening rules configuration file, using routing group header as default.
! org.yaml.snakeyaml.parser.ParserException: while parsing a block node
!  in 'reader', line 13, column 1:
!     
!     ^
! expected the node content, but found '<block end>'
!  in 'reader', line 13, column 1:
!     
!     ^
! 
! at org.yaml.snakeyaml.parser.ParserImpl.parseNode(ParserImpl.java:482)
! at org.yaml.snakeyaml.parser.ParserImpl.access$1300(ParserImpl.java:117)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockNode.produce(ParserImpl.java:359)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockSequenceEntry.produce(ParserImpl.java:506)
! at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockSequenceFirstEntry.produce(ParserImpl.java:496)
! at org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:158)
! at org.yaml.snakeyaml.parser.ParserImpl.checkEvent(ParserImpl.java:148)
! at org.yaml.snakeyaml.composer.Composer.checkNode(Composer.java:78)
! at org.yaml.snakeyaml.constructor.BaseConstructor.checkData(BaseConstructor.java:123)
! at org.yaml.snakeyaml.Yaml$1.hasNext(Yaml.java:489)
! at org.jeasy.rules.support.reader.YamlRuleDefinitionReader.loadRules(YamlRuleDefinitionReader.java:71)
! at org.jeasy.rules.support.reader.AbstractRuleDefinitionReader.read(AbstractRuleDefinitionReader.java:43)
! at org.jeasy.rules.mvel.MVELRuleFactory.createRules(MVELRuleFactory.java:100)
! at com.lyft.data.gateway.ha.router.RoutingGroupSelector.lambda$byRoutingRulesEngine$1(RoutingGroupSelector.java:42)
! at com.lyft.data.gateway.ha.handler.QueryIdCachingProxyHandler.rewriteTarget(QueryIdCachingProxyHandler.java:123)
! at com.lyft.data.proxyserver.ProxyServletImpl.rewriteTarget(ProxyServletImpl.java:65)
...

Using: java version "1.8.0_202" Python 3.7.13 trino-python-client 0.3.11.0

rules.yaml:

---
name: 'airflow'
description: 'if query from airflow, route to etl group'
condition: 'request.getHeader("X-Trino-Source") == "trino-python-client"'
actions:
  - 'result.put("routingGroup", "adhoc")'

python client script:

#!/usr/bin/python

from trino.dbapi import connect
import time

conn = connect(
    host='localhost',
    port=8090,
    user='user1',
    catalog='system',
    schema='runtime',
)
cur = conn.cursor()

while 1 < 2:
  cur.execute('SELECT * from nodes limit 1')
  rows = cur.fetchall()
  print (rows)

Other Observations:

After further investigation, I narrowed it down to MVELRuleFactory class. I'm not a developer so I can't really explain what is happening but the following fixes the issue.

--- a/gateway-ha/src/main/java/com/lyft/data/gateway/ha/router/RoutingGroupSelector.java
+++ b/gateway-ha/src/main/java/com/lyft/data/gateway/ha/router/RoutingGroupSelector.java
@@ -34,11 +34,11 @@ public interface RoutingGroupSelector {
    * to determine the right routing group.
    */
   static RoutingGroupSelector byRoutingRulesEngine(String rulesConfigPath) {
-    RulesEngine rulesEngine = new DefaultRulesEngine();
-    MVELRuleFactory ruleFactory = new MVELRuleFactory(new YamlRuleDefinitionReader());

     return request -> {
       try {
+        RulesEngine rulesEngine = new DefaultRulesEngine();
+        MVELRuleFactory ruleFactory = new MVELRuleFactory(new YamlRuleDefinitionReader());
         Rules rules = ruleFactory.createRules(
             new FileReader(rulesConfigPath));
         Facts facts = new Facts();

It ran for hours without any errors and CPU usage was stable. If anyone can provide more input, It would be appreciated.