publicsuffix / list

The Public Suffix List
https://publicsuffix.org/
Mozilla Public License 2.0
1.93k stars 1.18k forks source link

tools/internal/parser: rewrite parser to output a syntax tree, not a list #2025

Closed danderson closed 1 week ago

danderson commented 1 week ago

This makes the layout into a much more standard-looking recursive descent parser, and lightens the amount of effort spent on reporting egregiously invalid files in exchange for better code readability for the more common situation of structurally valid but failing on policy/lint issues.


There is a lot of change, sorry :( Converting the output to a tree and using a recursive descent parser resulted in a lot of dependent changes to other parts of the code. The good news, I guess, is that the parser now looks much more traditional and the weird text helper stuff in text.go is gone. The result (IMO) is much easier to follow and experiment with format changes, and is approximately the same amount of code as before even though it does more now.

I included code for a "debug format" output of the parse tree, which is verbose and not intended for external use. Just a demo of the structure of the output. The debug tree print for the current list is: https://gist.github.com/danderson/ab7a655e12c8daafe6a45e3aabc42f8d

I've started work on unparsing the PSL back to its normal form (with machine edits applied, e.g. block sorting), and experimenting with what a JSON output could look like. I didn't include those in this PR because, well, it's already huge :( But next steps after this change is deleting all that sorting code and replacing it with pslfmt automatic reformatting, more output formats, etc.

There is also a regression in test coverage, I need to add more tests to cover the new codepaths and suffix validation code. I plan to add those in followups, again just to keep the size of this change under control.