zeek / spicy

C++ parser generator for dissecting protocols & files.
https://docs.zeek.org/projects/spicy
Other
243 stars 37 forks source link

Problem with %skip when parsing multiple sequential regex fields. #1777

Closed sethhall closed 1 month ago

sethhall commented 1 month ago

There seems to be a problem with %skip working if there are multiple sequential regex fields. I'm not 100% certain that the issue is the multiple regex fields that cause the problem but I've played with it a bit and this definitely seems to be the issue. I get this error...

  [error] terminating with uncaught exception of type spicy::rt::ParseError: failed to match regular expression (/Users/seth/spicy/tests/.tmp/spicy.types.unit.skip-with-multiple-regexp-fields/skip-with-multiple-regexp-fields.spicy:11:8-11:10)

I have a branch containing a test case that I'm going to push in just a minute too.

bbannier commented 1 month ago

FTR, the test case from the branch was

# @TEST-EXEC: ${SPICYC} %INPUT -j -o %INPUT.hlto
# @TEST-EXEC: printf "a b" | spicy-driver %INPUT.hlto >>output
# @TEST-EXEC: btest-diff output

module Mini;

public type A = unit {
    %skip = / /;

    x: /a/;
    y: /b/;

    on %done {
        print self;
    }
};

Making the field parsers consume literals instead makes this test pass,

    x: b"a";
    y: b"b";

Looking at the generated code it seems something goes wrong with interning the regexps

const regexp __re = / /;
const regexp __re_1 = /a/;
const regexp __re_2 = / /;
const regexp __re_3 = / /;

I suspect this is different internings stepping on each others toes, e.g., https://github.com/zeek/spicy/blob/90751715212c7268ebb86276d4365dfaedd94882/spicy/toolchain/src/compiler/codegen/parsers/literals.cc#L115 and https://github.com/zeek/spicy/blob/90751715212c7268ebb86276d4365dfaedd94882/spicy/toolchain/src/compiler/codegen/parser-builder.cc#L913-L922

sethhall commented 1 month ago

Thanks for fixing! I wish I had checked the hilti code. The bug is pretty glaring there.