teler-sh / teler-waf

teler-waf is a Go HTTP middleware that protects local web services from OWASP Top 10 threats, known vulnerabilities, malicious actors, botnets, unwanted crawlers, and brute force attacks.
https://test.teler.sh
Apache License 2.0
324 stars 30 forks source link

[proposal] consolidating to a single regexp engine #83

Closed dwisiswant0 closed 10 months ago

dwisiswant0 commented 11 months ago

Background

Currently, teler-waf utilizes two different regexp engines for pattern matching: one from the built-in regexp and another from go-pcre. However, to enhance maintainability and performance, we're considering the transition to a single regexp engine. This decision is driven by the need to streamline our codebase and ensure consistent behavior across our library.

Objective

The primary objective of this proposal is to select a single regexp engine that best aligns with our requirements. It's crucial that the chosen engine supports PCRE patterns, as these patterns are integral to our use cases, specifically to check the CommonWebAttack threat.

Options

We have evaluated several alternative regexp engines that are commonly used:

Criteria

To make an informed decision, we should consider the following criteria:

  1. PCRE Pattern Support: The selected engine must fully support PCRE patterns, as they are a fundamental part of our library's functionality.
  2. Performance: Evaluate the performance of each engine, particularly in scenarios where our library is used intensively.
  3. Community and Maintenance: Consider the activity and responsiveness of the engine's maintainers and community. A vibrant community often leads to quicker issue resolution and ongoing improvements.
  4. Documentation: Well-documented engines simplify integration, troubleshooting, and future maintenance.
  5. Compatibility: Ensure that the chosen engine aligns with our library's other dependencies and doesn't introduce conflicts.

Proposed Steps

  1. Evaluate Engines: Assess the proposed engines based on the criteria outlined above. This evaluation should include hands-on testing and performance benchmarking.
  2. Engine Selection: Based on the comparison and community judgment, decide on the most suitable regexp engine for our library's needs.
  3. Implementation Plan: Once an engine is chosen, outline a clear plan for migrating our existing codebase to the selected engine. This should include steps for modifying and testing the code, as well as addressing any compatibility issues that may arise.
  4. Testing and Validation: Rigorously test the migration to ensure that the chosen engine behaves as expected and doesn't introduce regressions or new issues.

Conclusion

By transitioning to a single regexp engine that supports PCRE patterns, we can simplify our codebase, improve maintainability, and ensure consistent behavior. This proposal outlines the steps to evaluate and select the most appropriate engine for our library's needs. Let's work collaboratively to make this transition successful and beneficial for our users and the longevity of our project.


Note This issue proposal is meant to serve as a starting point for discussion. Feel free to contribute by suggesting modifications or adjustments as required. Your input is highly appreciated.

dwisiswant0 commented 11 months ago

Added new candidate, https://github.com/AspieSoft/go-regex.

dwisiswant0 commented 11 months ago

Current results

goos: linux
goarch: amd64
pkg: benchmark-regexp
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
GOMAXPROCS: 16
Click to toggle contents of bench code ```go package main import ( "fmt" "regexp" "testing" aspiesoft_regex "github.com/AspieSoft/go-regex/v8" grbit_pcre "github.com/GRbit/go-pcre" scorpionknifes_pcre "github.com/scorpionknifes/go-pcre" ) const ( patternNonPCRE = `(?i)(?:on(?:webkitanimationiteration|(?:(?:webkitanimation|(?:select|drag))s|t(?:ransition|ouch)s)tart|(?:webkit(?:transi|anima)tione|t(?:ransition|ouch)e|scrolle)nd|(?:beforescriptexecut|afterscriptexecut|(?:p(?:ointerrawupda|(?:opsta|as))|timeupda)t|b(?:eforetoggl|ounc)|(?:pointer|drag)leav|(?:pointer|touch)mov|mouse(?:lea|mo)v|pa(?:gehid|us)|resiz|clos)e|(?:mozfullscreen|fullscreen|(?:selec|dura)tion|hash|cue)change|unhandledrejection|a(?:nimation(?:iteration|cancel|start|end)|fterprint|uxclick)|transitioncancel|toggle\(popover\)|loaded(?:meta)?data|(?:canplaythroug|searc)h|(?:transitionru|(?:pointer|key)dow|mousedow|(?:focus|beg)i)n|pointerenter|(?:beforeunloa|invali|(?:seek|end)e|unloa)d|volumechange|c(?:(?:ontextmenu|ut)|opy)|(?:pointerov|drag(?:ent|ov))er|(?:(?:beforeinp|focuso)u|beforeprin|pointerou|beforecu|mouseou|submi|re(?:pea|se)|inpu)t|beforecopy|mouse(?:enter|over|up)|(?:mouse)?wheel|ratechange|(?:pointeru|keyu|dro)p|pageshow|progress|keypress|dblclick|canplay|dragend|playing|s(?:eeking|how)|message|s(?:croll|elect)|toggle|finish|change|focus|(?:erro|blu)r|click|start|drag|load|play|end))\s*?=` patternPCRE = `(?:\/\w*\s*\)\s*\()|(?:\([\w\s]+\([\w\s]+\)[\w\s]+\))|(?:(?

BenchmarkCompileNonPCRE

Engine ns/op B/op allocs/op elapsed
regexp 127972 131641 690 1.323665572s
AspieSoft/go-regex 29328 2322 3 1.384703177s
GRbit/go-pcre 21163 2744 6 1.330075384s
scorpionknifes/go-pcre 20630 12 2 1.735374764s

BenchmarkCompilePCRE

Engine ns/op B/op allocs/op elapsed
AspieSoft/go-regex 6647 465 3 1.122028641s
GRbit/go-pcre 4444 824 6 2.195591583s
scorpionknifes/go-pcre 4250 12 2 1.267168176s
dwisiswant0 commented 10 months ago

Reopening as per https://github.com/kitabisa/teler-waf/pull/90#issuecomment-1714659010.

dwisiswant0 commented 10 months ago

I will mark this issue as completed, because - upon evaluating the trade-offs associated with migrating the built-in regexp engine within CustomRule and the inThreatRegexp utility (used by BadIPAddress and DirectoryBruteforce), there is no performance enhancements were observed.