rencalo770 / gengine

Rule Engine for Golang
Other
433 stars 70 forks source link

how to avoid main thread crash when use Concurrent in rule meet panic #8

Closed shiqstone closed 4 years ago

shiqstone commented 4 years ago

I meet a problem to use "conc{}" to run rules, when there is panic happens for some reason, web service main thread will crash and service stop. seem like it wouldn't happen in non-concurrent. how can i recover the service?

rencalo770 commented 4 years ago

Thanks for your issue! Can I have a look of your rule?

Generally speaking, you should make sure the API you called in conc{} should be thread safety or no same API invoked in the same conc{} at the same time! Because the framework does not know what API will be invoked in this concurrent block.

doc: https://rencalo770.github.io/gengine_en/#/conc

shiqstone commented 4 years ago

here is the code

func invoker(userNo string, id int) (bool, error) {
    //try get ruleStr by id 
    //example 
    //  `BEGIN
    //      conc {
    //          ur_relrk1 = IsHitEvent(User.UserNo, "event_fact_1", param_relrk1)
    //          ur_relrk3 = IsHitEvent(User.UserNo, "event_fact_2", param_relrk3)
    //          // dynamic generation
    //      }
    //      User.Result = IsHit("and", ur_relrk1, ur_relrk3)
    //  END`
    ruleStr := GetRuleById(id)

    user := &EvCbUser{
        UserNo: userNo,
        Result: false,
    }

    dataContext := context.NewDataContext()

    //inject function
    dataContext.Add("IsHitEvent", IsHitEvent)
    dataContext.Add("IsHit", IsHit)

    //inject struct
    dataContext.Add("User", user)
    //must default false
    stag := &engine.Stag{StopTag: false}
    dataContext.Add("stag", stag)
    //init rule engine
    knowledgeContext := grbase.NewKnowledgeContext()
    ruleBuilder := builder.NewRuleBuilder(knowledgeContext, dataContext)

    //resolve rules from string
    start1 := time.Now().UnixNano()
    err := ruleBuilder.BuildRuleFromString(ruleStr)
    end1 := time.Now().UnixNano()

    logs.Info("rules num:%d, load rules cost time:%d ns", len(knowledgeContext.RuleEntities), end1-start1)

    if err != nil {
        logs.Error("err: ", err)
    } else {
        eng := engine.NewGengine()

        start := time.Now().UnixNano()
        // true: means when there are many rules, if one rule execute error,continue to execute rules after the occur error rule
        err := eng.ExecuteWithStopTagDirect(ruleBuilder, true, stag)

        end := time.Now().UnixNano()
        if err != nil {
            logs.Error("execute rule error: %v", err)
        }
        logs.Info("execute rule cost %d ns", end-start)
    }

    return user.Result, nil
}

func IsHitEvent(userNo string, conf string, param string) bool {
    //biz process, no concurrent block

    //TEST this will cause crash
    panic("unexpected error")
}

func IsHit(op string, pol ...bool) (res bool) {
    //biz process 
}
shiqstone commented 4 years ago

I try to add follow code into conc_statement.go, main thread would continue run, i'm not sure is this ok


conc_statement.go

95. go func() {
96.     defer func() {
97.         if err := recover(); err != nil {
98.             logrus.Errorf("concStatement execute error and recover: %v", err)
99.         }
100.        wg.Done()
101.    }()
102.         _, e := assignment.Evaluate(Vars)
`
rencalo770 commented 4 years ago

Thanks for your code and your more idea !

func() {
    defer func() {
        if err := recover(); err != nil {
            logrus.Errorf("concStatement execute error and recover: %v", err)
        }
        wg.Done()
    }()
         _, e := assignment.Evaluate(Vars)

This code is a method that golang helps application still run when there is panic. why we don't add this code to gengine, because we have to consider two conditions:

One is that the panic is caused by user's API, in framework gengine, it has no enough msg to decide whether the panic is needed to be recoverd! In fact, if the panic the user's API caused and should be recoverd, user should know it, and user should handle this panic recovery in their own API , not gengine do it, and gengine is unaware of user's panic recovery, this also could avoid causing unknown problems for user. this is not only fit for "conc{}", but also fit for every execute in gengine.

The other one is that the panic is cause by gengine. if user using gengine has followed the specification, it means the panic is not panic, the panic is a bug to gengine, and we must fix it for our user.

But wise approach to recovery panic is in application service framework, not one recovery for one panic in user‘s API. Such as in our company, we make gengine as a API to execute in grpc service. grpc framework will recover the panic to keep the service continue running to serve for the next requests!

rencalo770 commented 4 years ago

here is the code

func invoker(userNo string, id int) (bool, error) {
  //try get ruleStr by id 
  //example 
  //  `BEGIN
  //      conc {
  //          ur_relrk1 = IsHitEvent(User.UserNo, "event_fact_1", param_relrk1)
  //          ur_relrk3 = IsHitEvent(User.UserNo, "event_fact_2", param_relrk3)
  //          // dynamic generation
  //      }
  //      User.Result = IsHit("and", ur_relrk1, ur_relrk3)
  //  END`
  ruleStr := GetRuleById(id)

  user := &EvCbUser{
      UserNo: userNo,
      Result: false,
  }

  dataContext := context.NewDataContext()

  //inject function
  dataContext.Add("IsHitEvent", IsHitEvent)
  dataContext.Add("IsHit", IsHit)

  //inject struct
  dataContext.Add("User", user)
  //must default false
  stag := &engine.Stag{StopTag: false}
  dataContext.Add("stag", stag)
  //init rule engine
  knowledgeContext := grbase.NewKnowledgeContext()
  ruleBuilder := builder.NewRuleBuilder(knowledgeContext, dataContext)

  //resolve rules from string
  start1 := time.Now().UnixNano()
  err := ruleBuilder.BuildRuleFromString(ruleStr)
  end1 := time.Now().UnixNano()

  logs.Info("rules num:%d, load rules cost time:%d ns", len(knowledgeContext.RuleEntities), end1-start1)

  if err != nil {
      logs.Error("err: ", err)
  } else {
      eng := engine.NewGengine()

      start := time.Now().UnixNano()
      // true: means when there are many rules, if one rule execute error,continue to execute rules after the occur error rule
      err := eng.ExecuteWithStopTagDirect(ruleBuilder, true, stag)

      end := time.Now().UnixNano()
      if err != nil {
          logs.Error("execute rule error: %v", err)
      }
      logs.Info("execute rule cost %d ns", end-start)
  }

  return user.Result, nil
}

func IsHitEvent(userNo string, conf string, param string) bool {
  //biz process, no concurrent block

  //TEST this will cause crash
  panic("unexpected error")
}

func IsHit(op string, pol ...bool) (res bool) {
  //biz process 
}

please, pay attention to your this code: In any computer language, compile code is a cost-time and CPU intensive thing, it is also in gengine, so you should separate the rule build process and the engine execute process, and if you want to update your rules when gengine running,you should do as below:

https://rencalo770.github.io/gengine_en/#/example


type  MyService  struct{
    Kc       *base.KnowledgeContext
    Dc       *context.DataContext
    Rb       *builder.RuleBuilder
    Gengine  *engine.Gengine

    //field...
}

//init
func NewMyService(ruleStr string, /* other params */ ) *MyService {

    dataContext := context.NewDataContext()
    // there add what you want to use in every request
    dataContext.Add("println",  fmt.Println)

    knowledgeContext := base.NewKnowledgeContext()
    ruleBuilder := builder.NewRuleBuilder(knowledgeContext, dataContext)
    e := ruleBuilder.BuildRuleFromString(ruleStr)
    if e != nil {
        panic(e)
    }
    gengine := engine.NewGengine()

    return &MyService{
        Kc      : knowledgeContext,
        Dc      : dataContext,
        Rb      : ruleBuilder,
        Gengine : gengine,
    }
}

// when user want to update rules in running time, use it
func (ms *MyService)UpdateRule(newRuleStr string) error {

    rb := builder.NewRuleBuilder(ms.Kc, ms.Dc)
    e := rb.BuildRuleFromString(newRuleStr)
    if e != nil {
        return  e
    }
    //replace old ptr
    ms.Rb = rb
    return nil
}

//service
func (ms *MyService) Service(name string, req interface{}) error {

    ms.Dc.Add(name, req)
    e := ms.Gengine.Execute(ms.Rb, true)
    return e
}
shiqstone commented 4 years ago

recover the panic

Kind of understand, so is that ok if i recover the panic in my function "IsHitEvent" ?

shiqstone commented 4 years ago

here is the code

func invoker(userNo string, id int) (bool, error) {
    //try get ruleStr by id 
    //example 
    //  `BEGIN
    //      conc {
    //          ur_relrk1 = IsHitEvent(User.UserNo, "event_fact_1", param_relrk1)
    //          ur_relrk3 = IsHitEvent(User.UserNo, "event_fact_2", param_relrk3)
    //          // dynamic generation
    //      }
    //      User.Result = IsHit("and", ur_relrk1, ur_relrk3)
    //  END`
    ruleStr := GetRuleById(id)

    user := &EvCbUser{
        UserNo: userNo,
        Result: false,
    }

    dataContext := context.NewDataContext()

    //inject function
    dataContext.Add("IsHitEvent", IsHitEvent)
    dataContext.Add("IsHit", IsHit)

    //inject struct
    dataContext.Add("User", user)
    //must default false
    stag := &engine.Stag{StopTag: false}
    dataContext.Add("stag", stag)
    //init rule engine
    knowledgeContext := grbase.NewKnowledgeContext()
    ruleBuilder := builder.NewRuleBuilder(knowledgeContext, dataContext)

    //resolve rules from string
    start1 := time.Now().UnixNano()
    err := ruleBuilder.BuildRuleFromString(ruleStr)
    end1 := time.Now().UnixNano()

    logs.Info("rules num:%d, load rules cost time:%d ns", len(knowledgeContext.RuleEntities), end1-start1)

    if err != nil {
        logs.Error("err: ", err)
    } else {
        eng := engine.NewGengine()

        start := time.Now().UnixNano()
        // true: means when there are many rules, if one rule execute error,continue to execute rules after the occur error rule
        err := eng.ExecuteWithStopTagDirect(ruleBuilder, true, stag)

        end := time.Now().UnixNano()
        if err != nil {
            logs.Error("execute rule error: %v", err)
        }
        logs.Info("execute rule cost %d ns", end-start)
    }

    return user.Result, nil
}

func IsHitEvent(userNo string, conf string, param string) bool {
    //biz process, no concurrent block

    //TEST this will cause crash
    panic("unexpected error")
}

func IsHit(op string, pol ...bool) (res bool) {
    //biz process 
}

please, pay attention to your this code: In any computer language, compile code is a cost-time and CPU intensive thing, it is also in gengine, so you should separate the rule build process and the engine execute process, and if you want to update your rules when gengine running,you should do as below:

https://rencalo770.github.io/gengine_en/#/example

type  MyService  struct{
  Kc       *base.KnowledgeContext
  Dc       *context.DataContext
  Rb       *builder.RuleBuilder
  Gengine  *engine.Gengine

  //field...
}

//init
func NewMyService(ruleStr string, /* other params */ ) *MyService {

  dataContext := context.NewDataContext()
  // there add what you want to use in every request
  dataContext.Add("println",  fmt.Println)

  knowledgeContext := base.NewKnowledgeContext()
  ruleBuilder := builder.NewRuleBuilder(knowledgeContext, dataContext)
  e := ruleBuilder.BuildRuleFromString(ruleStr)
  if e != nil {
      panic(e)
  }
  gengine := engine.NewGengine()

  return &MyService{
      Kc      : knowledgeContext,
      Dc      : dataContext,
      Rb      : ruleBuilder,
      Gengine : gengine,
  }
}

// when user want to update rules in running time, use it
func (ms *MyService)UpdateRule(newRuleStr string) error {

  rb := builder.NewRuleBuilder(ms.Kc, ms.Dc)
  e := rb.BuildRuleFromString(newRuleStr)
  if e != nil {
      return  e
  }
  //replace old ptr
  ms.Rb = rb
  return nil
}

//service
func (ms *MyService) Service(name string, req interface{}) error {

  ms.Dc.Add(name, req)
  e := ms.Gengine.Execute(ms.Rb, true)
  return e
}

thanks for your advice, i'll try to refactor my code.

rencalo770 commented 4 years ago

Yeah, also thanks your attention! And we updated the version to v1.2.0, and the framework become more clear and easy to understand. all doc are also updated, just need make a little change to move from old version to new version.

https://rencalo770.github.io/gengine_en/#/example

rencalo770 commented 4 years ago

recover the panic

Kind of understand, so is that ok if i recover the panic in my function "IsHitEvent" ?

it should be in IsHitEvent if IsHitEvent is independent service, if not, the recovery should in the top service which contains IsHitEvent , gengine and the other servives.