google / flatbuffers

FlatBuffers: Memory Efficient Serialization Library
https://flatbuffers.dev/
Apache License 2.0
23.29k stars 3.25k forks source link

C++ Proposal: Compiler-Agnostic table/struct identifiers #3826

Closed Nnamdi closed 5 years ago

Nnamdi commented 8 years ago

I have been investigating using FlatBuffers in a C++ project, and it seems pretty solid. It might be a lack of understanding, but I think the (relatively small) changes described below would allow developers to create cleaner interfaces using FlatBuffers. What do you think?

I can implement the changes and provide a pull request for review.

Motivation

When writing a library for sending/receiving messages with an unknown encoding scheme, you might write a class with a signature similar to this:

template <typename EncodingType>
struct MessageSocket
{
    template <typename T>
    void send(const T&) { }

    void poll() { }

    template <typename T>
    void RegisterHandler(std::function<void(T)>) { }
};

Here, the send function would write something similar to <length><message-type-id><serialized-data> to the socket. The poll function would use the <message-type-id> to dispatch to the appropriate handler.

This is possible when using libraries such as Protocol Buffers, because given an instance of the message to send, you can get a compiler-agnostic identifier, and access the serialized data. FlatBuffers, however, keeps the serialized data and type information separate - and is missing compiler-agnostic identifier.

Proposed Changes

Have flatc generate the following function for Tables/Structs:

struct MyTable : private flatbuffers::Table {
    static FLATBUFFERS_CONSTEXPR const char *GetId() {
        return "fully.qualified.name.MyTable";
    }
};

Change the convenience functions (e.g. CreateMyTable) to return the following new type, defined in flatbuffers.h:

template <typename T>
struct BuilderOffset {
    typedef T value_type;
    BuilderOffset(FlatBufferBuilder& b, Offset<T> o) : builder(&b), offset(o) { }
    operator Offset<T>() const { return offset; } // backwards compatibility with existing code.
    FlatBufferBuilder *builder;
    Offset<T> offset;
};

Usage Example

We can now implement the send function above in the following manner:

template <typename T>
void send(const T& message)
{
    message.builder->Finish(message.offset);

    const auto dataLen = message.builder->GetSize();
    const auto hash = constexpr_hash_function(T::value_type::GetId());
    const auto msgLen = dataLen + sizeof(dataLen) + sizeof(hash);

    socket.write(&msgLen, sizeof(msgLen));
    socket.write(&hash, sizeof(hash));
    socket.write(message.builder->GetBufferPointer(), dataLen);
}

With the usage:

flatbuffers::FlatBufferBuilder fbb;
send(CreateMyTable(fbb, fbb.CreateString("test"), 123));

Alternatives

It could be argued that the hash could be stored in FlatBuffer itself, using the Finish function, and just provide a getter - something like:

inline const char *GetIdentifier(const void *buf) {
    return reinterpret_cast<const char*>(buf) + sizeof(uoffset_t);
}

With usage:

// to send
enum id { MY_TABLE };
flatbuffers::FlatBufferBuilder fbb;
flatbuffers::Offset<MyTable> offset = CreateMyTable(fbb, fbb.CreateString("test"), 123);
std::uint32_t ident = id::MY_TABLE;
fbb.Finish(offset, reinterpret_cast<const char*>(&ident));
send(fbb);

// to receive
std::uint32_t ident = *reinterpret_cast<const std::uint32_t*>(flatbuffers::GetIdentifier(buf));
// ...

However, there are 2 significant drawbacks:

  1. Message creation is clunky.
  2. Potentially error prone, as the identifier is set independently, subsequent refactors could lead to the wrong value being set, or the user forgetting to pass the identifier to the Finish call.
ghost commented 8 years ago

We already have the file_identifier feature (which you can declare in a schema). They are however not related to the type, they generate a global function:

inline const char *MonsterIdentifier() { return "MONS"; }

That's perhaps an oversight, a static function in Monster would have been better, and would have allowed your use case.

These things are stored in the buffer, and can be checked against (in this case MonsterBufferHasIdentifier). The reason it is a check and not just returning the identifier, is that identifiers are not mandatory, and determinining if there's no identifier present is inexact.

Adding fully qualified names to the generated code is another option, though I'm not sure if most people would like to have this by default, as it adds more strings to their binary.

I'm not in favor of your BuilderOffset idea. This is only there to avoid passing a FlatBufferBuilder to send, and is unnecessary overhead for most use cases. I would even say that denoting what message to send by an offset into an unfinished buffer seems fragile and non-intuitive to me.

You could achieve something similar on your end by making a TypedBuffer template, that is instantiated with the root type as template parameter, and holds either a FlatBufferBuilder or maybe better, a const uint8_t * / size_t pair.

Nnamdi commented 8 years ago

Thanks for getting back to me.

I think I could do what I had planed quite neatly with just the GetId (or GetFullyQualifiedName?) function - would you be willing to accept a patch allowing the generation of those functions on tables/structs, given a flatc switch?

Thanks.

ghost commented 8 years ago

GetFullyQualifiedName would be better, to reduce the chances of conflicts. A switch would be nice, maybe --gen-name-strings or something, that way we can add further names to it in the future.

mikkelfj commented 8 years ago

Mentioned on google groups, proposing using FNV-1 hash little endian as a default pre-calculated identifier for a given type:

https://groups.google.com/forum/#!topic/flatbuffers/FsgsUOPj2Zg

ghost commented 8 years ago

Sounds good, see my response there.

mikkelfj commented 8 years ago

For reference, the following file implements the type hash, but normally it would just be a generated constant. https://github.com/dvidelabs/flatcc/blob/master/include/flatcc/flatcc_identifier.h

simgt commented 8 years ago

Would it be useful to have something similar when a table is included in a union ? I could make a PR.

For instance, multiplayer.fbs from pienoon contains union Data { PlayerAssignment, PlayerCommand, StartTurn, EndGame, PlayerStatus }, which would generate the following static methods (assuming scoped enums):

Data PlayerAssignment::GetDataUnionId() { return Data::PlayerAssignment; }
Data PlayerCommand::GetDataUnionId() { return Data::PlayerCommand; }
Data StartTurn::GetDataUnionId() { return Data::StartTurn; }
...

This would avoid a switch when reading unknown data.

ghost commented 8 years ago

@notsimon : a table can be part of multiple unions, so having a single id may be problematic. Also, I don't understand in what situation it simplifies code?

mikkelfj commented 8 years ago

I also doubt the usefulness, but an example usecase would be a handler function that accepts any number of table pointers and a hash type (the numeric equivalent to the type_identifier string). The table could originate from a union in a message envelope table, or directly as a root table, or by other means such as a UI event. In this case it is convenient to be able to map the union type to to a hash type similar to mapping a union type to an enum name. The solution would be a simple extension ot the union api: GetHashTypeFromUnionType, or mytable.myunion.HashType(), where 0 is NONE. Of course there is a risk of hash type conflicts, but it is managable.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had activity for 1 year. It will be automatically closed if no further activity occurs. To keep it open, simply post a new comment. Maintainers will re-open on new activity. Thank you for your contributions.