zyedidia / sregx

A tool and library for using structural regular expressions.
MIT License
59 stars 4 forks source link

Structural PEGs #6

Open lobre opened 8 months ago

lobre commented 8 months ago

Loving so far all the explorations you have around parsing and manipulating structured text! Thanks a lot for documenting and open-sourcing all of that.

I see in the "Future Work" section of the README that you mention "Structural PEGs". I have to say that I am intrigued, as I have recently dug a lot around "simple" text editors with great leverage (small footprint, restricted set of features for a lot of power). I have come across kakoune and vis and discovered structural regular expressions. Then I also spent time discovering Rob Pike's editors acme and sam. They are by nature following this philosophy of bringing a lot while staying simple.

And I am currently exploring what could be a more interactive version of sam. Vis is a great take, but it "just" mixes two different concepts: vim-like operations (motions/text objects) and structural regex. But I don't find them especially integrated with each other. Ideally, there should be only one language for describing a selection/change in the structure of a file, for both small ad-hoc changes or more complex "sam-like" operations. Kakoune does a better job of unifying both interfaces.

I recently stumbled upon a comment on the kakoune forum from @Screwtapello where he describes what an editor could be if the concepts of selection/extraction were a little bit nicer than regex in terms of interface while staying minimalist and hackable.

I definitely think the next killer feature of Vi-lineage editing will be structural knowledge, but I think it needs to be more like writing a regex or a Kakoune syntax highlighter than writing “real” code. I’d like to tell my editor that comment = "//" not("\n")* and then anywhere I write a regex I can write $comment to mean “text spans tagged as comments”.

Some people mention PEG afterwards.

I don't know what you have in mind with "structural pegs", but it seems it could correlate with this philosophy. That would be great if you could describe what it means to you, and how it could apply to a tool like sregx. As it seems you spent a lot of time in that space, I am sure that your take on "what could be a structural pegs editor compared to a structural regex editor" would be very valuable.

zyedidia commented 7 months ago

Thanks for the interest! I think the structural PEG idea was to apply the hierarchical approach of structural regex, where multiple regexes are repeatedly applied to perform some kind of edit or search, to PEGs. The underlying matching language (regex vs PEG) seems interchangeable with respect to the "structural-ness" (commands and repeated matching), so I don't think it would be too hard to create a structural PEG tool, given an existing PEG parser.

I think PEGs are a great mechanism for expressing language structure for text editors, though I do think regexes also have their place particularly for simple searches (regexes can be compiled to PEGs though). Recently, I have been working on a new version of micro that integrates my incremental PEG parsing engine gpeg into the editor (currently for syntax highlighting). It uses simple PEG definitions of languages to do syntax highlighting (here's an example: https://github.com/zyedidia/flare/blob/master/languages/c.lang), but I would be interested in exploring a more general application of using PEGs to define textual movement/selection, auto-formatting, etc. for many languages (a PEG-based editor if you will). I'm hoping to officially announce this new version of micro in the coming months (with more details and other cool features), and in subsequent versions the PEG support can be enhanced beyond just syntax highlighting.

I think there are some interesting design questions in how all these characteristics should be defined (should they be in one monolithic PEG for each language, or should there be multiple separate grammars for syntax highlighting, formatting, and text objects), as well as how the textual motions/selection in particular should work. I would also be curious to learn more about how other editors use structural regexes, and whether there would be a benefit to structural PEGs over just normal PEGs. If you have some ideas, or have a design in mind, I would be interested in discussing further. I haven't researched existing approaches in detail (sam, acme, kakoune, vis), so I'd like to learn more about how they work or could be improved upon.

lobre commented 7 months ago

Thanks for taking the time to answer!

I think the structural PEG idea was to apply the hierarchical approach of structural regex, where multiple regexes are repeatedly applied to perform some kind of edit or search, to PEGs.

I am still new to PEGs, so I struggle to understand how they can be exactly used in practice. I get that one can create a grammar against some regular text to give it meaning. And so this grammar is a set of "parsing rules". Taking this grammar and a piece of text, one could easily check if the text "matches" the grammar. But then, I suppose it is also possible to extract text matching a sub-parsing rule of the grammar? For instance, if I have a grammar that defines C code including a sub-rule to define code comments, I guess I can easily extract all the text that is considered as being a comment according to this rule.

Given the above is accurate, when I read "structural PEG", it raises questions because, to me, PEG is already a concept that brings structure by definition. Indeed, a grammar defines a structure by combining rules. So when you refer to "structural PEG", do you mean combining/using multiple rules of the same grammar to search text? I could for example want to find/extract all the "keywords" that are parts of a "function arguments" (keywords between ( and )). My grammar could define those rules "keyword" and "function arguments".

Or maybe you mean constructing a grammar "on the fly/interactively" given a raw text (by defining correlated rules interactively one after another)? Or using a combination of multiple grammars (same as structural regex using a combination of multiple regex)?

I would be interested in exploring a more general application of using PEGs to define textual movement/selection, auto-formatting, etc. for many languages (a PEG-based editor if you will).

I also have a lot of interest in that topic!

Glad that you can bring PEG to micro, that's awesome!