jamescourtney / FlatSharp

Fast, idiomatic C# implementation of Flatbuffers
Apache License 2.0
511 stars 51 forks source link

FlatSharp Compiler parses file_identifier and file_extension incorrectly #39

Closed syntroniks closed 3 years ago

syntroniks commented 4 years ago

Antlr parser grammar needs a semicolon token to recognize these statements.

I didn't see any warnings or errors, which is disappointing. I had no idea why no code was being generated.

jamescourtney commented 4 years ago

Thanks for reaching out. I also appreciate you sending in a pull request. I can take a look at these sometime this week -- the change is not large. Before that though, I'd like to chat about what the behavior we should target is.

file_extension is probably simpler to talk about. FlatSharp doesn't write to files or read from files. If the consuming library wants to write to a file, then they should pick what extension they want at that time. I think the best thing for flatsharp to do here is to ignore file_extension. If the flatc grammar wants a semicolon, we can certainly accommodate that :)

file_identifier is more interesting. I actually have a couple of questions for you about what the behavior should be. If I have an FBS file like this:

table A { Value:int; }
table B { Value:int; }
file_identifier "ABCD";

Does Flatc attach "ABCD" to both A and B? What's the behavior if I attempt to deserialize a table with a mismatched file_identifier? Some sort of InvalidDataException?

syntroniks commented 4 years ago

file_extension

Yes, it could be attached somewhere, perhaps to the schema for when an application wants to access that information. It would allow the schema to act as a source of truth.

I'm not using this in my use case yet, so ignoring it at this point is totally fine.

file_identifier should not be table-dependent.

In https://google.github.io/flatbuffers/flatbuffers_guide_writing_schema.html at File identification and extension, the authors note that:

Identifiers must always be exactly 4 characters long. These 4 characters will end up as bytes at offsets 4-7 (inclusive) in the buffer.

I believe this is schema-dependent, not namespace dependent. Practically though, a decision will have to be made regarding where to place this identifier in generated code.

The authors mention that:

After loading a buffer, you can use a call like MonsterBufferHasIdentifier to check if the identifier is present.

So I would expect this to return buffer[3:6] == self.file_identifier

    if not SomeClassHasIdentifier(buffer):
        raise InvalidDataException()

If a file_identifier is not present in the schema, I would expect *HasIdentifier to throw a NotImplementedException, not be generated at all, or return False since this buffer has no identifier.

Looking at the google repository and their generated code, *HasIdentifier is optional to call, and is not checked when creating an object from a buffer

syntroniks commented 4 years ago

In your example

table A { Value:int; }
table B { Value:int; }
file_identifier "ABCD";

No *BufferHasIdentifier methods were generated.

When specifying a root_type A, ABufferHasIdentifier and AIdentifier were generated.

I suspect file_identifier requires a root type, so if it is attached to a table, it would be the root table.

jamescourtney commented 3 years ago

file_identifier and root_type have been added in master, and will be shipped in version 5 whenever that's all done. Probably later this month.

File_extension hasn't been added, and I don't have plans to do so (except, perhaps, by making the grammar flexible enough to just ignore it). FlatSharp just doesn't have much to do with files.