yhirose / cpp-peglib

A single file C++ header-only PEG (Parsing Expression Grammars) library
MIT License
884 stars 112 forks source link

Question on how to create auto-formatters from PEG grammar #146

Closed no-more-secrets closed 3 years ago

no-more-secrets commented 3 years ago

Hi, I am looking for some advice. I am using cpp-peglib and I have a PEG grammar for my DSL. The language supports line comments that start with #, i.e., like in a shell script. I have implemented that using this in my PEG grammar:

LINE_COMMENT   ←  '#' (!LINE_END .)* LINE_END
LINE_END       ←  '\r\n' / '\r' / '\n' / !.
%whitespace    ←  ([ \t\r\n]+ / LINE_COMMENT)*

and this works well, so I can use the grammar to compile my DSL.

But now, let's say that I want to build an auto-formatter (a.k.a. pretty printer) for my language. The formatter would read in some source code and output it nicely formatted with comments preserved (although the comments might be reshaped to do line wrapping for example). How do you recommend implementing this with cpp-peglib? Can the parser give me information on the comments in the source code so that I can retain them?

Any general advice on how to solve this problem?

yhirose commented 3 years ago

@dpacbach, for source code formatting, I think all the text parts including white spaces and comments in the source text are significant. So a grammar needs to preserve everything, so that action handlers or an AST can detect them. It means that we need to manage 2 different grammars, one is for code processing purpose and the other one is for text formatting purpose.

Here is an example for text formatting. It doesn't use %whitesapce, so that we don't lose any text elements that can be reformatted later.

image

If we don't want to maintain 2 grammars, we should make a AST transformer which get red of the white spaces to make an AST for interpreters or compilers.

Does it make sense?

no-more-secrets commented 3 years ago

@yhirose Thanks for the response. Indeed I was hoping to avoid maintaining two separate grammars. So I think I will try using the AST transformer example. Thanks