zeek / zeek

Zeek is a powerful network analysis framework that is much different from the typical IDS you may know.
https://www.zeek.org
Other
6.36k stars 1.21k forks source link

Broker::make_event BIF drops "holes" in vectors #3045

Open Neverlord opened 1 year ago

Neverlord commented 1 year ago

For context, the original issue was reported by @simeonmiteff here: https://github.com/zeek/broker/issues/345.

He found that Broker is dropping "holes" in vectors when doing something like this:

local v: vector of string = vector();
v[0] = "one";
# hole at v[1]
v[2] = "three";

However, I found that it's not actually Broker.

    local v: vector of string = vector();
    v[0] = "one";
    # hole at v[1]
    v[2] = "three";
    local f = Broker::make_event(foo, "some-values", v);
    print fmt("some-values: %s", f);

Prints some-values: [name=foo, args=[[data=broker::data{some-values}], [data=broker::data{(one, three)}]]].

So it seems like the nullopt/none entries are long gone at the time Broker ever retrieves the event. I did some digging and it seems like the BIF is ultimately calling val_to_data. This is an excerpt from Data.cc:

        case TYPE_VECTOR:
            {
            auto vec = v->AsVectorVal();
            broker::vector rval;
            rval.reserve(vec->Size());

            for ( auto i = 0u; i < vec->Size(); ++i )
                {
                auto item_val = vec->ValAt(i);

                if ( ! item_val )
                    continue;

                auto item = val_to_data(item_val.get());

                if ( ! item )
                    return broker::ec::invalid_data;

                rval.emplace_back(std::move(*item));
                }

            return {std::move(rval)};
            }

We call vec->ValAt(i) in the loop, which calls At(i). This function returns Val::nil if the vector does not contain a value (the internal optional at that index is nullopt). Val::nil seems to be just nullptr at the end of the day. Hence, we skip nil entries here (via continue).

Long story short: it looks to me like val_to_data is dropping holes in vectors instead of mapping them to broker::none. Is this intentional or should we update the implementation?

awelzel commented 1 year ago

Long story short: it looks to me like val_to_data is dropping holes in vectors instead of mapping them to broker::none. Is this intentional or should we update the implementation?

If we update the implementation, could we consider making it an error attempting to pass vectors with holes over broker instead of none? Given it's not working today, seems a valid route.

Mostly, I'm worried about the ripple effects of allowing it. Specifically for language bindings: E.g., a vector of string received over broker can't be mapped to a plain std::vector<std::string> anymore because potentially missing data. The user would be forced a more complicated interface because need to assume elements are optional. Dynamically typed languages may seem easier, but not handling optional data (nil, None, undefined) within a vector would technically not be correct.

Is there a use-case for holes in vectors over Broker? Even for Zeek scripting alone, using holes purposely seems a bit of a minefield.

simeonmiteff commented 1 year ago

@awelzel I have no use case for holes 😁

I was writing a btest for https://github.com/corelight/go-zeek-broker-ws and wanted to exercise every type the broker JSON encoding supports, but couldn't figure out how to create a null value in zeek. Vern pointed out that holes are such values, which is how I found this corner case.

I agree, in the absence of a use case, making it an error would be doing everyone a favour.

awelzel commented 1 year ago

I was writing a btest for https://github.com/corelight/go-zeek-broker-ws and wanted to exercise every type the broker JSON encoding supports, but couldn't figure out how to create a null value in zeek.

An optional but unset field in a record should be represented as null/none when transported via broker :crossed_fingers:

simeonmiteff commented 1 year ago

AFAIK there is no broker JSON encoding for records.

On Tue, 16 May 2023 at 17:09, Arne Welzel @.***> wrote:

I was writing a btest for https://github.com/corelight/go-zeek-broker-ws and wanted to exercise every type the broker JSON encoding supports, but couldn't figure out how to create a null value in zeek.

An optional but unset field in a record should be represented as null when transported via broker 🤞

— Reply to this email directly, view it on GitHub https://github.com/zeek/zeek/issues/3045#issuecomment-1549109775, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMAJUMNZYRQXNF6V3QUK5LXGMRZNANCNFSM6AAAAAAYARIAKU . You are receiving this because you were mentioned.Message ID: @.***>

awelzel commented 1 year ago

AFAIK there is no broker JSON encoding for records.

There is, it's just not explicit. I think you're in for a ride :-)

Christian started a discussion here: https://github.com/zeek/zeek/discussions/2879

Mapping Zeek's records to Broker's data model expects records as a tuple of values, with each member in its own Broker-model rendering. The vector must cover the full list of fields, including Nones for optional ones. As a result, any change to the record definition means you must adapt your Python implementation to reflect it. This is true for both the traditional wire format and the new WebSocket one.

Your input on that discussion would be most valuable.