universal-ctags / ctags

A maintained ctags implementation
https://ctags.io
GNU General Public License v2.0
6.51k stars 622 forks source link

build-sys: utilize pegof #4026

Closed masatake closed 2 months ago

masatake commented 3 months ago

Pegof is a PEG grammar optimizer and formatter developed at https://github.com/dolik-rce/pegof.

This change is for utilizing pegof for optimizing our peg based parsers.

We assume pegof is installed at ${somewhere}/pegof.

$ ./autogen.sh $ ./configure ... --with-pegof=${somewhere}/pegof ... $ make

To verify the command line invoking pegof, pass V=1 option to make like:

$ make V=1

Makefile runs pegof to generate a .pego file from the given .peg file. Then, it runs packcc to generate .c and .h files from the .pego file. Via make command, you can generate a specified pego file directly. e.g.

$ make V=1 peg peg/varlink.pego

This may be helpful for debugging pegof.

If you successfully build ctags with pegof, you will see "pegof" in the output of ./ctags—-list-features.

TODO:

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 85.42%. Comparing base (d8ffdbd) to head (8778fd3).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #4026 +/- ## ======================================= Coverage 85.42% 85.42% ======================================= Files 235 235 Lines 56729 56729 ======================================= Hits 48462 48462 Misses 8267 8267 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

masatake commented 3 months ago

~See the commit log; -j1 is needed.~

~See https://github.com/dolik-rce/pegof/issues/12 .~

Fixed in https://github.com/dolik-rce/pegof/pull/13.

masatake commented 3 months ago

@dolik-rce, I got an interesting result.

When turning on Pegof, all cases testing the Thrift peg-based parser fail.

You can run the test locally:

bash autogen.sh
./configure --with-pegof=../build/pegof
make
make units LANGUAGES=Thrift

You can compare Units/parser-thrift.r/simple-thrift.d/expected.tags and Units/parser-thrift.r/simple-thrift.d/FILTERED.tmp.

peg file optimized by Pegof can be found in peg/thrift.pego.

dolik-rce commented 3 months ago

Most definitely a bug in pegof. I'll try to investigate, but it's probably going to take few days at least, I won't have much time available next week.

dolik-rce commented 3 months ago

I've been able to pinpoint the optimization that causes the problem (or maybe just first of the problems). It happens when Function rule is inlined:

--- peg/thrift.peg      2024-06-30 14:28:20.400662868 +0200
+++ peg/thrift.peg      2024-06-30 14:28:20.400662868 +0200
@@ -148,32 +148,33 @@
         tagEntryInfo *e = getEntryInCorkQueue (r);
         if (e)
             e->extensionFields.inheritance = eStrdup ($2); } __
-    )? __ "{" __ (Function __)* (
-        "}"
-        / EndOfServiceError
-    ) _ TypeAnnotations? EOS { POP_SCOPE (auxil); }
-
-EndOfServiceError <- .
-
-Function <-
-    ("oneway" __)? <
-    "void"
-    / FieldType
-> __ <Identifier> { int r = makeThriftTag (auxil, $2, $2s, K_FUNCTION, true);
+    )? __ "{" __ (
+        (
+            ("oneway" __)? <
+            "void"
+            / FieldType
+        > __ <Identifier> { int r = makeThriftTag (auxil, $4, $4s, K_FUNCTION, true);
     tagEntryInfo *e = getEntryInCorkQueue (r);
     if (e)
     {
         e->extensionFields.typeRef[0] = eStrdup ("typename");
-        e->extensionFields.typeRef[1] = eStrdup ($1);
+        e->extensionFields.typeRef[1] = eStrdup ($3);
     }
     PUSH_KIND (auxil, K_PARAMETER); } _ <"(" __ FieldList ")"> { int r = BASE_SCOPE (auxil);
     tagEntryInfo *e = getEntryInCorkQueue (r);
     if (e)
-         e->extensionFields.signature = eStrdup ($3); } __ (
-        "throws" { PUSH_KIND (auxil, K_THROWSPARAM); } __ <"(" __ FieldList ")"> { int r = BASE_SCOPE (auxil);
-    attachParserFieldToCorkEntry (r, ThriftFields[F_THROWS].ftype, $4);
+         e->extensionFields.signature = eStrdup ($5); } __ (
+                "throws" { PUSH_KIND (auxil, K_THROWSPARAM); } __ <"(" __ FieldList ")"> { int r = BASE_SCOPE (auxil);
+    attachParserFieldToCorkEntry (r, ThriftFields[F_THROWS].ftype, $6);
     POP_KIND (auxil, false); }
-    )? _ TypeAnnotations? ListSeparator? { POP_KIND (auxil, true); }
+            )? _ TypeAnnotations? ListSeparator? { POP_KIND (auxil, true); }
+        ) __
+    )* (
+        "}"
+        / EndOfServiceError
+    ) _ TypeAnnotations? EOS { POP_SCOPE (auxil); }
+
+EndOfServiceError <- .

 FieldType <-
     (

I'm still not sure why, the result looks good to me, even the references seem to be correctly renumbered.

masatake commented 3 months ago

The parser extract "a.thrift" from

include "a.thrift"

. However, it didn't.

dolik-rce commented 2 months ago

The parser extract "a.thrift" from include "a.thrift" However, it didn't.

The include paths must be specified, fot include statement to work correctly (unless it is in one of the default directories), similar to packcc. See https://github.com/dolik-rce/pegof?tab=readme-ov-file#inputoutput-options, option -I/--import, or specify PCC_IMPORT_PATH variable. This should work.

By the way: Pegof can be also used as drop-in replacement for packcc. When you run pegof -p -O all ... it optimizes the grammar and generates the header and source files for the optimized parser (via the embeded packcc code). It might be simpler to implement this into the ctags build system, as it requires no additional steps and preprocessed files. On the other hand, it might be more difficult to debug.

masatake commented 2 months ago

I didn't write about "include" or "import" feature of packcc or pegof. I wrote about the ability of the code optimized by pegof.

input.thrift:

include "a.thrift"

The original thrift parser can extract "a.thrift" as a reference tag. ~However, the optimized version cannot do it.~ My mistake. the optimized version can also extract it.

I was looking for smaller input with that the optimized version doesn't work well. Smaller input makes debugging easier.

$ ctags --list-features| grep pegof
$ cat /tmp/input.thrift            
include "a.thrift"
$ ctags -o - --extras=+r input.thrift
a.thrift    input.thrift    /^include "a.thrift"$/;"    T
$~/bin/ctags --list-features| grep pegof
pegof             makes peg based parser(s) optimized (experimental)
$ ~/bin/ctags -o - --extras=+r input.thrift
a.thrift    input.thrift    /^include "a.thrift"$/;"    T

I will report a smaller input reproducing the issue here.

dolik-rce commented 2 months ago

The problem should be fixed now (in https://github.com/dolik-rce/pegof/commit/2ba466796bf372a3151b7c84e2c1bba8f5225537). All unit tests in this branch are passing with pegof 0.4.2.

masatake commented 2 months ago

@dolik-rce, thank you very much.

I added a document about how to use pegof to build ctags.