stephenberry / glaze

Extremely fast, in memory, JSON and interface library for modern C++
MIT License
1.1k stars 110 forks source link

Binary (BEVE) support lacking for object keys #1248

Open RazielXYZ opened 1 month ago

RazielXYZ commented 1 month ago

A bit of an odd one as it seems weirdly specific.

Given a relatively simple repro such as:

struct kTest {
    int a, b;
    bool operator==(const kTest&) const = default;
};

template <>
struct std::hash<kTest> {
    std::size_t operator()(const kTest& s) const noexcept {
        return std::hash<int>{}(s.a) ^ (std::hash<int>{}(s.b) << 1);
    }
};

int main() {
    const std::unordered_map<kTest, int> testmap {{kTest{3, 7}, 0}};
    std::string buf;
    const auto ec = glz::write_binary(testmap, buf);
    std::cout << buf << std::endl;
    return 0;
}

The code fails to build with glaze 3.1.9 (from conan) as well as latest 3.2.1 (manually added to the project) on MSVC latest (17.10.5), with /std:c++latest (or 20, but that's no longer supported by glaze officially anyway).

Write_json works fine on the same data.

The errors shown are the same in both versions:

1>glaze\binary\write.hpp(102,57): error C2903: 'no_header': symbol is neither a class template nor a function template
1>glaze\binary\write.hpp(102,88): error C2143: syntax error: missing ')' before '('
1>glaze\binary\write.hpp(102,78): error C2888: 'forward': symbol cannot be defined within namespace 'detail'
1>glaze\binary\write.hpp(102,102): error C2888: 'forward': symbol cannot be defined within namespace 'detail'
1>glaze\binary\write.hpp(103,78): error C2888: 'forward': symbol cannot be defined within namespace 'detail'
1>glaze\binary\write.hpp(103,98): error C2888: 'forward': symbol cannot be defined within namespace 'detail'
1>glaze\binary\write.hpp(103,113): error C2059: syntax error: ')'

A similar error happens on gcc 13.2 or 14.0 as well.

stephenberry commented 1 month ago

Thanks for reporting this bug! Will look into it soon.

stephenberry commented 1 month ago

So, the problem here is that your key type is an object (struct) and not a string (or integer). JSON requires string keys and because BEVE (binary) must map to JSON it has the same requirement (although it allows integers are these can be quoted in JSON).

Being able to switch between JSON and BEVE and represent the same data is very important.

For your use case, what do you think a good approach would be to solve your problem?

RazielXYZ commented 1 month ago

That would make sense, except replacing write_binary with write_json in that case appears to both build and work fine.

Would it maybe make sense to use the hash (as a string) as the key? I assume that would only work for hash-based containers though. Alternatively, perhaps representing the key as some sort of custom formatted string containing the string representation of every component of the key - not sure if this would have to be user defined (some sort of ToString method for any object used as a key) or generated by glaze.

RazielXYZ commented 1 month ago

This is the output from write_json on the same map: {"{\"a\":3,\"b\":7}":0} Which seems to be pretty much an automatically generated to string from all members on the key, almost like it's using the key's serialization to json and just wrapping it up in a string again - which makes sense to me! So, not sure why the same would not work for write_binary. It also works the same way for a normal (ordered) std::map as well (but write_binary still does not work on that).

stephenberry commented 1 month ago

Yes, for JSON we serialize the key type in JSON string form. But, it is a bad idea to do this with binary for a number of reasons:

Could you explain what your use case is? In this example you could just pack your ints into a uint64_t and the binary would work fine and you'd get better hashing performance. But, I expect you have more complex objects you want to use as keys.

RazielXYZ commented 1 month ago

Correct, the use case is just generally being able to serialize maps with more complex keys (that don't fit into 64 bits) to BEVE as well, primarily for better performance where needed. I assume I could use a custom read/write step for key objects where that is the case, to pack them in whatever way seems fit for the given object (which, most likely, will end up being a string for most of them, considering the requirement for a key to be a string or int), or I can adjust the way I structure the data to make it more usable for this. Or I can try to figure out why my performance with json isn't as good as I'd like, in the cases where it's an issue.

stephenberry commented 1 month ago

As I think about this more, I think serializing the object key to BEVE and treating it as a BEVE string and then using that as the key would make sense. The problem is that this wouldn't translate into a meaningful string in JSON.

I would recommend handling this key serialization to/from strings right now yourself, as you've described. As, I don't want to rush into a solution in the binary specification itself.

Thanks for bringing this up and sharing your thoughts. As you figure out a solution feel free to post thoughts here. I'll keep this issue open and just reword it to address the core issue.

RazielXYZ commented 1 month ago

That sounds good! And I agree, no need to rush. If I get any useful ideas I'll add them here.