zeek / spicy

C++ parser generator for dissecting protocols & files.
https://docs.zeek.org/projects/spicy
Other
248 stars 37 forks source link

Aggregate if statements #1839

Closed sethhall closed 1 month ago

sethhall commented 2 months ago

It would be great to be able to do aggregate if statements like this...

type X = unit {
    has_extra_fields: uint8;
    if ( self.has_extra_fields == 1 ) {
        field1: uint8;
        field2: uint8;
        field3: uint8;
    }
};

Currently this would be done with the if statement suffix that is currently supported but it's a bit harder to read, write, and comprehend so that same code would look like this...

type X = unit {
    has_extra_fields: uint8;
    field1: uint8 if ( self.has_extra_fields == 1 );
    field2: uint8 if ( self.has_extra_fields == 1 );
    field3: uint8 if ( self.has_extra_fields == 1 );
};
bbannier commented 2 months ago

FTR, a currently supported way to group fields under the same condition is a unit switch statement, in this case

type X = unit {
    has_extra_fields: uint8;
    switch (self.has_extra_fields) {
        1 -> {
            field1: uint8;
            field2: uint8;
            field3: uint8;
        }
        * -> : void;
    };
};

This requires explicitly handling not-matched cases so I added a case for * above.

rsmmr commented 2 months ago

Indeed, and I'm thinking to reuse the same machinery that we already have for the case-blocks to implement the requested feature. That's why I believe this shouldn't be difficult. I'll give that a try.

sethhall commented 2 months ago

FTR, a currently supported way to group fields under the same condition is a unit switch statement, in this case

Wow, I totally did not think of that at all! Thankfully the new block if syntax will be a lot cleaner looking. :)

sethhall commented 2 months ago

Switched my notepad-cache parser to aggregate if statements and seems to work fine... https://github.com/sethhall/spicy-parsers/blob/a9a72b4c0a84cfed1fe885bfcca84bf7d42dc721/notepad-cache/notepad-cache-parser.spicy#L107

sethhall commented 2 months ago

Haha, ok. Now I think I see how I'm going to end up abusing this feature and I almost wonder if it would be useful to create a minor additional feature. The other day I wanted to create a group of fields within a unit and apply a &max-size or something to the group of fields so I tried making a subscope (having no clue if this is a thing or would work)...

         {
             x: bytes &eod;
         } &size=10;

Long story short, it didn't.

Now with this branch I see that I'll be able to do this...

         if ( True ) {
             x: bytes &eod;
         } &size=10;

Sort of makes me wonder if it would make sense to add the scoping trick like I tried? Seems like it would be an incredibly superficial change to this branch.

sethhall commented 2 months ago

Eh, one more comment I wanted to throw in here... Could we get "if else" and else with this too? Does that architecturally fit?

rsmmr commented 2 months ago

I tried making a subscope

Funny, this has come up three times now while discussing this branch with people. I agree that that would be a nice extension, and it's indeed easy to implement on the backend side with the changes in #1841.

However, there's a problem: syntax. I tried making the straight-forward change to the Spicy Bison parser, and it turns out our syntax gets ambiguous due to hooks that may be attached to a previous field. Consider this unit field:

    a: uint32 
        { print "hi "; } 
        { print "there"; }

The parser allows this: multiple inline hooks for a single field. But that means that this would now be ambiguous:

    a: uint32 
        { print "hi "; } 
        { print "there"; }
    {
        x: bytes &eod;
    } &size=10;

The parser can't tell if that 3rd block is another hook or a new-style block of fields.

That means we need some different syntax. Any ideas?

rsmmr commented 2 months ago

Could we get "if else" and else with this too?

Yeah, I think so.

sethhall commented 2 months ago

it turns out our syntax gets ambitious due to hooks that may be attached to a previous field.

Ah, of course.

That means we need some different syntax. Any ideas?

Ehh, I'm not sure I care enough about subscopes to introduce a new syntax for them. Honestly I'm not totally sure how scoping rules would even look since I imagine that fields defined in a sub scope would just be put into the top level unit anyway.

awelzel commented 2 months ago

I agree that that would be a nice extension

I was wondering if this is just a short-cut for writing a full unit out (though IIUC it uses different infrastructure and might be more performant).

That means we need some different syntax. Any ideas?

a: uint32 {
  print "hi";
}

: unit {
  x: bytes &eod;
} &size=10;

# using struct
: struct {
  x: bytes &eod;
} &size=10;

# or even none of them?
: {
  x: bytes &eod;
}

# more magic anonymous looking.
x: _ { ... } 

With unit, likely unit hooks would be expected to work in the inner block and that might not fit with current ideas? Or, would %init just happen to work - question might also what is self within the block? Maybe struct is a way out, meaning to parse an inner struct? But possibly a user might expect to pass that struct around?

Maybe just completely anonymous if that disagrees like the last examples show.

For lifting or nesting fields, maybe naming the anonymous block would prevent lifting and add some structure?

inner : {
  x: bytes &eod;
} &size=10;

on %done {
  print inner.x;
}
rsmmr commented 2 months ago

I was wondering if this is just a short-cut for writing a full unit out

yeah, there's clearly a fuzzy line here, with the advantage that the short-cut could indeed be quite a bit more efficient.

inner : {
  x: bytes &eod;
} &size=10;

I had arrived at that same point as well! :-) This would also align with the syntax we already have for bitfields.

However, I'm wondering if we're now too far beyond that line between short-cut and full unit. I am kind of back to just supporting a plain if (maybe with attributes; maybe an if-else) at this point, and leave everything else to full unit type definitions. At that point we could actually also then pretty easily implement a long desired optimization: lifting simple units up into the parent (for simple cases); given this machinery, it'd just be AST transformation now, even if we didn't provide the syntax to directly code it that way.

sethhall commented 2 months ago

However, I'm wondering if we're now too far beyond that line between short-cut and full unit. I am kind of back to just supporting a plain if (maybe with attributes; maybe an if-else) at this point, and leave everything else to full unit type definitions.

Yep, that's where I am. I didn't like that the more I thought about scopes that all kinds of new questions starting popping up that would need to be answered. Just feels like we'd be making things hard on ourselves.