nginxinc / nginx-go-crossplane

A library for working with NGINX configs in Go
Apache License 2.0
46 stars 12 forks source link

Implements Scanner type for tokenizing nginx configs #80

Open ornj opened 7 months ago

ornj commented 7 months ago

Proposed changes

Implemented crossplane.Scanner that follows the example of other "scanner" types implemented in the Go stdlib. The existing Lex uses concurrency to make tokens available to the caller while managing "state". I think this design queue was taken from Rob Pike's 2011 talk on Lexical Scanning in Go. If you look at examples from the Go stdlib-- such as bufio.Scanner that Lex depends on-- you'd find that this isn't the strategy being employed and instead there is a struct that manages the state of the scanner and a method that used by the caller to advance the scanner to obtain tokens.

After a bit of Internet archeology, I found this post on golang-nuts from Rob Pike himself:

That talk was about a lexer, but the deeper purpose was to demonstrate how concurrency can make programs nice even without obvious parallelism in the problem. And like many such uses of concurrency, the code is pretty but not necessarily fast.

I think it's a fine approach to a lexer if you don't care about performance. It is significantly slower than some other approaches but is very easy to adapt. I used it in ivy, for example, but just so you know, I'm probably going to replace the one in ivy with a more traditional model to avoid some issues with the lexer accessing global state. You don't care about that for your application, I'm sure.

So: It's pretty and nice to work on, but you'd probably not choose that approach for a production compiler.

An implementation of a "scanner" using the more "traditional" model-- much of the logic is the same or very close to Lex-- seems to support the above statement.

$ go test -benchmem -run=^$ -bench "^BenchmarkScan|BenchmarkLex$" github.com/nginxinc/nginx-go-crossplane -count=1 -v
goos: darwin
goarch: arm64
pkg: github.com/nginxinc/nginx-go-crossplane
BenchmarkLex
BenchmarkLex/simple
BenchmarkLex/simple-10             70982             16581 ns/op          102857 B/op         37 allocs/op
BenchmarkLex/with-comments
BenchmarkLex/with-comments-10      64125             18366 ns/op          102921 B/op         43 allocs/op
BenchmarkLex/messy
BenchmarkLex/messy-10              28171             42697 ns/op          104208 B/op        166 allocs/op
BenchmarkLex/quote-behavior
BenchmarkLex/quote-behavior-10     83667             14154 ns/op          102768 B/op         24 allocs/op
BenchmarkLex/quoted-right-brace
BenchmarkLex/quoted-right-brace-10                 48022             24799 ns/op          103369 B/op         52 allocs/op
BenchmarkScan
BenchmarkScan/simple
BenchmarkScan/simple-10                           179712              6660 ns/op            4544 B/op         34 allocs/op
BenchmarkScan/with-comments
BenchmarkScan/with-comments-10                    133178              7628 ns/op            4608 B/op         40 allocs/op
BenchmarkScan/messy
BenchmarkScan/messy-10                             49251             24106 ns/op            5896 B/op        163 allocs/op
BenchmarkScan/quote-behavior
BenchmarkScan/quote-behavior-10                   240026              4854 ns/op            4456 B/op         21 allocs/op
BenchmarkScan/quoted-right-brace
BenchmarkScan/quoted-right-brace-10                87468             13534 ns/op            5056 B/op         49 allocs/op
PASS
ok      github.com/nginxinc/nginx-go-crossplane 13.676s

This alternative to Lex is probably a micro-optimization for many use cases. As the size and number of NGINX configurations that need to be analyzed grows, optimization can be a good thing as well as an API that feels familiar to Go developers who might use this tool for their own purposes.

Next steps:

Checklist

Before creating a PR, run through this checklist and mark each as complete.

ornj commented 1 month ago

Now supports @xynicole's changes to enable tokenizing Lua.

Benchmarks:

❯ go test -benchmem -run=^$ -bench "^(BenchmarkLex|BenchmarkLexWithLua|BenchmarkScanner|BenchmarkScannerWithLua)$" github.com/nginxinc/nginx-go-crossplane -count=1
goos: darwin
goarch: arm64
pkg: github.com/nginxinc/nginx-go-crossplane
BenchmarkLex/simple-10             57963             17756 ns/op          103049 B/op         39 allocs/op
BenchmarkLex/with-comments-10      60025             20067 ns/op          103112 B/op         45 allocs/op
BenchmarkLex/messy-10              26170             47822 ns/op          104400 B/op        168 allocs/op
BenchmarkLex/quote-behavior-10             74510             17693 ns/op          102961 B/op         26 allocs/op
BenchmarkLex/quoted-right-brace-10         43134             27752 ns/op          103560 B/op         54 allocs/op
BenchmarkLex/comments-between-args-10      78271             14866 ns/op          102937 B/op         27 allocs/op
BenchmarkLexWithLua/lua-basic-10           46273             26012 ns/op          105499 B/op         53 allocs/op
BenchmarkLexWithLua/lua-block-simple-10                    22514             54149 ns/op          108556 B/op        143 allocs/op
BenchmarkLexWithLua/lua-block-larger-10                    25983             46605 ns/op          108403 B/op         59 allocs/op
BenchmarkLexWithLua/lua-block-tricky-10                    33756             35067 ns/op          106684 B/op         66 allocs/op
BenchmarkScanner/simple-10                                163138              7084 ns/op            4648 B/op         36 allocs/op
BenchmarkScanner/with-comments-10                         144558              8100 ns/op            4712 B/op         42 allocs/op
BenchmarkScanner/messy-10                                  47570             25026 ns/op            6000 B/op        165 allocs/op
BenchmarkScanner/quote-behavior-10                        222280              5083 ns/op            4560 B/op         23 allocs/op
BenchmarkScanner/quoted-right-brace-10                     82656             14281 ns/op            5160 B/op         51 allocs/op
BenchmarkScanner/comments-between-args-10                 225475              4872 ns/op            4536 B/op         24 allocs/op
BenchmarkScannerWithLua/lua-basic-10                       93081             12833 ns/op            7866 B/op         66 allocs/op
BenchmarkScannerWithLua/lua-block-simple-10                31426             37989 ns/op           10924 B/op        156 allocs/op
BenchmarkScannerWithLua/lua-block-larger-10                37148             30723 ns/op           10770 B/op         72 allocs/op
BenchmarkScannerWithLua/lua-block-tricky-10                54890             22383 ns/op            9050 B/op         79 allocs/op
PASS
ok      github.com/nginxinc/nginx-go-crossplane 29.969s