zeek / spicy

C++ parser generator for dissecting protocols & files.
https://docs.zeek.org/projects/spicy
Other
248 stars 37 forks source link

Resynchronizing on unit error when failure happens after the first field. #1847

Closed Mohan-Dhawan closed 1 month ago

Mohan-Dhawan commented 2 months ago

When chunked data is read into a sink, the analyzer throws a violation if a TCP ACKed unseen segment is present. The violation is of the form view starts before available range. A reproducer has been shared privately with @bbannier.

rsmmr commented 2 months ago

Sorry, but this isn't really useful as a ticket because the scope is completely unclear. Spicy doesn't even have a notion of TCP ACKing; if anything, that's Zeek-side. I'm closing this, but feel free to reopen once it has been scoped down to something actionable.

bbannier commented 2 months ago

Part of the issue here is that when we try to resychronize on an exhausted view we propagate up an incorrect error (view starts before available range instead of e.g., failed to synchronize: &size amount not consumed: ...). This regressed with 3199cc6bb which landed with 1.11.

For this snippet I would expect resynchronization to fail with the original error instead of running into an exception from an exhausted view when parsing checks for View::isComplete (exception via View::_ensureValid).

module foo;

public type X = unit {
    : (Y &synchronize)[];
};

type Y = unit {
    : b"FOO";
    : bytes &size=1;
};
$ printf FOO | spicy-driver -d foo.spicy
[error] terminating with uncaught exception of type hilti::rt::InvalidIterator: view starts before available range (foo.spicy:7:10-10:1)
sethhall commented 2 months ago

Sort of unrelated, but Mohan and I poked around synchronization yesterday and accidentally discovered where this bug actually lies. It looks like if the field failure happens in the first field of the structure, then synchronization happens correctly, but if it happens in a field after that there is an error and recovery doesn't occur. It turns out that all of the gap tests only test the failure occurring in the first field of the unit. The best part is that we have a short and easy to understand reproducer script!

This works fine...

module Test;

type Blah = unit {
    ab: /AB/;
    cd: /CD/;
};

public type X = unit {
    x: (Blah &synchronize)[];

    on %synced { print "Confirming with state: %s" % self; confirm; }
    on %done { print self; }
};

with this spicy-driver batch file...

!spicy-batch v2
@begin-flow id1 stream Test::X
@data id1 1
A
@gap id1 1
@data id1 6
CDABCD
@end-flow id1
❯ spicy-driver -F test7.dat test7.spicy
Confirming with state: [$x=[]]
[$x=[[$ab=b"AB", $cd=b"CD"]]]

but if we force the error to occur in the second field by letting the first field be fully populated (the first 'C' is missing) with this batch file we get the error...

!spicy-batch v2
@begin-flow id1 stream Test::X
@data id1 2
AB
@gap id1 1
@data id1 5
DABCD
@end-flow id1
❯ spicy-driver -F test7.dat test7.spicy
error for ID id1: view starts before available range (test7.spicy:9:5-9:29)

Hopefully this is boiled down enough to pinpoint where the problem is. I tested this in several different ways and as far as I can tell the failures seem to be centered around where in the unit the errors are happening. It leads me to believe that some state in the stream is being mutated and not reset once fields have been successfully parsed and an error happens later in the unit structure (I'm also crossing my fingers that this means this bug might be easily fixed!)

rsmmr commented 1 month ago

I've started to look into this, but no leads yet.

sethhall commented 1 month ago

According to git bisect I found the offending commit....

3199cc6bbd10e31ea7aa1fadd7589cb9b4c9797e is the first bad commit
commit 3199cc6bbd10e31ea7aa1fadd7589cb9b4c9797e
Author: Robin Sommer <robin@corelight.com>
Date:   Tue Jul 16 09:59:16 2024 +0200

    Fix illegal stream content access through invalid view.

    When a view's beginning ended up outside of the stream's valid range
    (e.g., because the stream has been trimmed in the meantime), we
    wouldn't catch that on access and attempt to access data that was no
    longer valid. This commit extends the safety checks accordingly.

    Closes #1795.

 hilti/runtime/include/types/stream.h               |  12 +-
 hilti/runtime/src/tests/stream.cc                  |  15 +-
 spicy/runtime/src/parser.cc                        |   5 +-
 .../spicy.types.unit.size-inside-size/output       |  32 ++++
 .../gap_between_matches.log                        |  22 +--
 .../gap_while_matching.log                         |  14 +-
 tests/Baseline/spicy.types.unit.synchronize/output | 210 ++++++++++-----------
 tests/Baseline/spicy.types.unit.trimming/.stderr   |  84 ++++-----
 tests/spicy/types/unit/size-inside-size.spicy      |  27 +++
 9 files changed, 247 insertions(+), 174 deletions(-)
 create mode 100644 tests/Baseline/spicy.types.unit.size-inside-size/output
 create mode 100644 tests/spicy/types/unit/size-inside-size.spicy
bisect found first bad commit
rsmmr commented 1 month ago

Right, see Benjamin's note above.

sethhall commented 1 month ago

Ugh, I totally missed that he pointed out the commit. Anyway, first time I've used git bisect and that was definitely worthwhile learning.

sethhall commented 1 month ago

I think this bug was in 1.11, would it justify a 1.11.2? This bug does break most practical uses of synchronize.

Mohan-Dhawan commented 1 month ago

Will it be backported to Zeek v7.0.1?

rsmmr commented 1 month ago

Yes, it will be in 1.11.2 and 7.0.2.