Closed nicksnyder closed 6 years ago
Thanks for doing this :).
Could you add the following as additional info:
The examples in the presentation can be found here: https://github.com/sougou/parser_tutorial. Please refer to vitess.io for more information on Vitess. PlanetScale provides enterprise support for Vitess. For more information, please visit planetscale.com.
cc @ryan-blunden @attfarhan because I am about to hop on a plane
(the current post does link to Vitess in the byline and the code samples in the summary, but we could add that at the bottom)
Presenter: Sugu Sougoumarane
Liveblogger: Nick Snyder
Sugu Sougoumarane is the co-creator of Vitess, which contains a SQL parser that is now used by many other projects. In this talk he demonstrates how to write a parser using goyacc.
How to Write a Parser in Go
Summary
Parser use cases
How to write a parser in two easy steps
goyacc
Why goyacc
Why not goyacc
Using goyacc
General steps:
Goyacc is almost an exact translation of the original yacc so some of the idiosyncrasies have been inherited. For example, C programs return only 1 value: 0 for success and 1 for failure. This means you need awkward boilerplate to give values to the lexer:
Use go generate to create the actual parser.go file:
How to parse a phone number
Area code has three parts: area code, first part, second part.
Captital letters signify tokens.
How to return values
The generated parser is just a single function that runs a state machine and uses local variables.
These variables are saved in a union data structure:
Actions run Go code (i.e. everything inside the braces) when a rule matches. Dollar variables address a variable that is a value returned by the parser.
Lexing
Two things are happening concurrently during lexing:
Sometimes lex can return the byte itself as an int. Yacc has builtin predetermined tokens so all first 127 bytes are reserved and can be returned without telling the parser you are returning them
How does it work?
Goyacc
Options for lexing
Lex is not code that you live in. It is code you write once and then use for a long time. Ok if the code is not clean.
Future improvements
For complicated grammars (e.g. SQL), Goyacc can generate a big result structure that is expensive to pass around. Goyacc actually assigns this structure every time there is a state transition.
C (yacc) has structure called union which efficiently packs the datastructre, but there is no equivalent in Go....except interfaces are a very close equivalent!
Unlike C union type, you can type assert an interface in Go. One limitation with using a type asserted Go interface is that it is an rvalue which means you can't assign to it.
Switching Vitess to use an interface instead of struct doubles performance, but would be a backward incompatible change to goyacc.