google / flatbuffers

FlatBuffers: Memory Efficient Serialization Library
https://flatbuffers.dev/
Apache License 2.0
22.56k stars 3.19k forks source link

(C++) Vector::LookupByKey doesn't work with some string_view keys #8200

Closed mpawlowski-eyeo closed 3 months ago

mpawlowski-eyeo commented 6 months ago

Using flatc version 23.5.26.

With a schema like this:

table Mapping {
  key: string(key);
  value: Data;
}

If you add to this table:

key: "key100"
data: {some data}

And then perform a lookup like so:

std::string key = "key100";
std::string_view truncated_key = key;
truncated_key.remove_suffix(2);   // truncated_key = "key1"

auto* result = mapping()->LookupByKey(truncated_key.data());

result will contain {some data}.

My intention was to search for "key1" but instead I got results for "key100".

This is because truncated_key's value is not null-terminated, and the algorithm effectively goes out-of-bounds with respect to the string_view's size().

LookupByKey is implemented with std::bsearch and its pointer-based API makes it impossible to fix the problem I think. std::bsearch can only compare a key pointer with a data pointer, without knowing the size of key - only the size of data can be specified upfront. Even a custom comparator cannot solve the problem because it's a function pointer that receives two void* arguments and cannot be bound with extra information (like the size of key).

Maybe LookupByKey can be implemented via std::binary_search instead - it allows arbitrary key values and comparators and should be just as efficient.

mpawlowski-eyeo commented 6 months ago

Another way this problem manifests is:

key: "key1"
data: {I want this}
std::string key = "key100"
std::string_view truncated_key = key;
truncated_key.remove_suffix(2);   // truncated_key = "key1"

auto* result = mapping()->LookupByKey(truncated_key.data());

result is empty - I don't find what I'm looking even though

LookupByKey quietly searches for "key100" instead.

mpawlowski-eyeo commented 6 months ago

Seems you can fix this while still using std::bsearch - PR submitted.