Open Neverlord opened 1 year ago
Long story short: it looks to me like val_to_data is dropping holes in vectors instead of mapping them to broker::none. Is this intentional or should we update the implementation?
If we update the implementation, could we consider making it an error attempting to pass vectors with holes over broker instead of none
? Given it's not working today, seems a valid route.
Mostly, I'm worried about the ripple effects of allowing it. Specifically for language bindings: E.g., a vector of string
received over broker can't be mapped to a plain std::vector<std::string>
anymore because potentially missing data. The user would be forced a more complicated interface because need to assume elements are optional. Dynamically typed languages may seem easier, but not handling optional data (nil, None, undefined) within a vector would technically not be correct.
Is there a use-case for holes in vectors over Broker? Even for Zeek scripting alone, using holes purposely seems a bit of a minefield.
@awelzel I have no use case for holes 😁
I was writing a btest for https://github.com/corelight/go-zeek-broker-ws and wanted to exercise every type the broker JSON encoding supports, but couldn't figure out how to create a null value in zeek. Vern pointed out that holes are such values, which is how I found this corner case.
I agree, in the absence of a use case, making it an error would be doing everyone a favour.
I was writing a btest for https://github.com/corelight/go-zeek-broker-ws and wanted to exercise every type the broker JSON encoding supports, but couldn't figure out how to create a null value in zeek.
An optional but unset field in a record should be represented as null/none when transported via broker :crossed_fingers:
AFAIK there is no broker JSON encoding for records.
On Tue, 16 May 2023 at 17:09, Arne Welzel @.***> wrote:
I was writing a btest for https://github.com/corelight/go-zeek-broker-ws and wanted to exercise every type the broker JSON encoding supports, but couldn't figure out how to create a null value in zeek.
An optional but unset field in a record should be represented as null when transported via broker 🤞
— Reply to this email directly, view it on GitHub https://github.com/zeek/zeek/issues/3045#issuecomment-1549109775, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMAJUMNZYRQXNF6V3QUK5LXGMRZNANCNFSM6AAAAAAYARIAKU . You are receiving this because you were mentioned.Message ID: @.***>
AFAIK there is no broker JSON encoding for records.
There is, it's just not explicit. I think you're in for a ride :-)
Christian started a discussion here: https://github.com/zeek/zeek/discussions/2879
Mapping Zeek's records to Broker's data model expects records as a tuple of values, with each member in its own Broker-model rendering. The vector must cover the full list of fields, including Nones for optional ones. As a result, any change to the record definition means you must adapt your Python implementation to reflect it. This is true for both the traditional wire format and the new WebSocket one.
Your input on that discussion would be most valuable.
For context, the original issue was reported by @simeonmiteff here: https://github.com/zeek/broker/issues/345.
He found that Broker is dropping "holes" in vectors when doing something like this:
However, I found that it's not actually Broker.
Prints
some-values: [name=foo, args=[[data=broker::data{some-values}], [data=broker::data{(one, three)}]]]
.So it seems like the
nullopt
/none
entries are long gone at the time Broker ever retrieves the event. I did some digging and it seems like the BIF is ultimately callingval_to_data
. This is an excerpt fromData.cc
:We call
vec->ValAt(i)
in the loop, which callsAt(i)
. This function returnsVal::nil
if the vector does not contain a value (the internaloptional
at that index isnullopt
).Val::nil
seems to be justnullptr
at the end of the day. Hence, we skip nil entries here (viacontinue
).Long story short: it looks to me like
val_to_data
is dropping holes in vectors instead of mapping them tobroker::none
. Is this intentional or should we update the implementation?