jamescourtney / FlatSharp

Fast, idiomatic C# implementation of Flatbuffers
Apache License 2.0
511 stars 51 forks source link

FlatSharp compiler support for key (SortedDictionary) #30

Closed thomas-t1 closed 4 years ago

thomas-t1 commented 4 years ago

FlatSharp looks like a very nice library for FB in C#, however when trying to test FS vs "plain" FB, I immediately ran into a showstopper with my existing and widely used schema; no support for key/sorted dictionaries in schema. I was unable to generate anything so I don't have an idea (yet) on what it would take to support it. Is supporting key something that is considered or planned?

I will remove (key) for testing probably, but this will break native usage which relies on binary search for lookup. Perhaps a manual sort will suffice?

jamescourtney commented 4 years ago

Hi Thomas -- thanks for reaching out. Can you share your schema? I'll take a look at this.

jamescourtney commented 4 years ago

After doing a bit of prototyping on this, I think a solution is pretty feasible. I have a prototype version of support in the 'copyConstructors' branch. There are precisely two test methods, but I'm curious to see your thoughts.

C# attribute based sorted vectors FBS based sorted vectors

The main difference from FlatBuffers is that you need to tag the table property that contains the vector with SortedVector in the metadata. This is because FlatSharp assembles the object for you, so you need to make the decision about sorting at schema time.

Here's a set of sample packages with this prototype change: https://github.com/jamescourtney/FlatSharp/tree/beta

thomas-t1 commented 4 years ago

Hi James, thanks a lot for looking into this so quickly! I've tried your implementation and it seems to work very well! I quickly looked at the branch as well. I verified that the lists are sorted automatically on serialization i.e. that the order is correct after parse.

However I also ran into a few small issues, my schema is already in use to generate code for C++ and C (via flatc and flatcc) and a few other languages and different platforms including embedded platforms. I.e. I would like to use the same schema file for C++/C and C# via FlatSharp and I discovered the following;

  1. I cannot get user defined attributes to work both with flatc and the FS compiler, since flatc requires the attribute string value to be quoted, but FS does not support this syntax, see below
  2. The FB attribute (nested_flatbuffer); causes an error, I think the FS compiler could safely ignore this attribute, possibly with a warning, and not do anything special with the byte array.

This following example is extracted from the schema files I'm working with. The schema works in flatc but not in FS compiler, and if I unquote the values for the attributes the situation reverses :)

flatc vs FS compiler syntax; (VectorType: "IReadOnlyList") vs (VectorType: IReadOnlyList)

Example

attribute "VectorType";
attribute "SortedVector";

[SNIP]

table SignalFrame
{
    SequenceNumber: uint;
    Timestamp: ulong;
    Tags: [uint]; // FNV1a hash

    // Systemic/Algorithmic parameter changes
    StateChanges: [ubyte]; // (nested_flatbuffer: "ParameterSet");

    Signals: [Signal] (VectorType: "IReadOnlyList", SortedVector);
}

Thanks again for your support and a very cool project!

-t1

jamescourtney commented 4 years ago

Agree that nested_flatbuffer can be safely ignored A vector is a vector. For some of the other default attributes FS doesn't support, I wanted to explicitly fail rather than give the impression that things were working.

Thanks for pointing out the quoting issue. That was just a dumb bug.

New beta packages are available in the beta branch. LMK if these work better. I need to write some more tests before I'm comfortable releasing these, and also need to verify interop with the official FlatBuffer library, so it might be a little bit. My day job has also been very busy recently.

thomas-t1 commented 4 years ago

Hi James,

I tried out the new FS version (beta branch) on my schema and the attribute quote issue is fixed and the FS compiler now ignores the nested_flatbuffer attribute. So everything is working as expected... almost :-)

However when I tried the deserializer/parser on some of my existing data I noticed a difference between FS and FB (flatc) generated code wrt enums, values in FS code seem to be one off; I tried the C# generated by flatc and compared against FS C# generated code; I noticed that I was getting the wrong enum values shown in the FS generated code.

My enum looks like this;

enum VariantDataType : ushort
{
    Bool,
    UInt32,
    Int32,
    Float,
    Double,
    ComplexFloat,
    Binary,       // Record/blob size can be found implicitly from shape and size of data i.e. blob_size = value_size / PI(shape)
    String,
    UInt16,
    Int16
}

The FB (flatc) generated code looks like this;

public enum VariantDataType : ushort
{
 Bool = 0,
 UInt32 = 1,
 Int32 = 2,
 Float = 3,
 Double = 4,
 ComplexFloat = 5,
 Binary = 6,
 String = 7,
 UInt16 = 8,
 Int16 = 9,
};

However the FS compiler generated code looks like this;

        [FlatBufferEnum(typeof(ushort))]
        [System.Runtime.CompilerServices.CompilerGenerated]
        public enum VariantDataType : ushort
        {
            Bool = (ushort)1,
            UInt32 = (ushort)2,
            Int32 = (ushort)3,
            Float = (ushort)4,
            Double = (ushort)5,
            ComplexFloat = (ushort)6,
            Binary = (ushort)7,
            String = (ushort)8,
            UInt16 = (ushort)9,
            Int16 = (ushort)10,
        }
jamescourtney commented 4 years ago

Yup, that's embarrassing. I think I was thinking of protobuf there, which does start at 1. This will necessitate revving the version to 3.0. Thanks for catching it.

jamescourtney commented 4 years ago

The enum is fixed. New packages for version 3.0.0 are in the beta branch. I'd like to reiterate how much I appreciate your input and interest :)

Changes...

I've reworked how SortedTables express their key. Basically, I wanted to provide a BinarySearch extension method for a table of Array/IList/IReadOnlyList, but this was tricky when the key was only specified via an attribute. I ended up doing a lot of duplicated reflection, and it was kind of nasty.

So instead, there's an interface:

public interface IKeyedTable<TKey>
{
   TKey Key { get; }
}

The compiler implements this interface explicitly based on the property you specify as the key. For Runtime based stuff, you'll need to implement the interface yourself.

The upshot of all this is that it allows a binary search extension with the following signature:

public static TTable BinarySearchByFlatBufferKey<TTable, TKey>(this IList<TTable> sortedVector, TKey key) 
    where TTable : IKeyedTable<TKey>
{
}

The change from what I gave you before is that sorting was broken initially. FlatBuffers does sorting based on byte comparisons of the UTF8 strings. This led to some inconsistencies with .NET's native sort, so I had to build an equivalent IComparer implementation that first converts to UTF8. It's not the fastest thing, but it is consistent with FlatBuffers in all of my testing.

Test cases for sorted vectors have been expanded as well.

thomas-t1 commented 4 years ago

Hi James, I finally got back to testing the FS 3.0 beta version, however I ran into a problem;

My schema is divided into several .fbs files and use include and apparently something broke in FS compiler between 2.2 and 3.0 versions;

I.e. if I downgrade to FS compiler 2.2 it works as expected, however in 3.0 I get FS compiler errors;

flatsharp.compiler\3.0.0\build\netcoreapp2.1\FlatSharp.Compiler.targets(15,5): error : Message='Unable to resolve type 'ParameterTransaction' as a built in or defined type', Scope=$..Novelda.Serialization.FlowParameter

I was able to verify that the enum issue was fixed however, by merging the schema files into a single schema file. I haven't had the chance to run the parser on existing data yet, and try out the binary search extension, it it looks nice and I will try it out soon.

jamescourtney commented 4 years ago

I've updated the packages in the beta branch. It should support includes now. I've reverted back to using the attribute based way of doing things, instead of forcing an interface declaration. It was a tricky decision, but you shouldn't care as long as you're using the FBS files.

The Binary search method is now BinarySearchByFlatBufferKey.

This set of packages is reasonably well tested by unit tests, and is the first I've given you that I'm comfortable publishing. I'd like to get your feedback on that first, though.

Sorted Vectors example: https://github.com/jamescourtney/FlatSharp/tree/copyCtors-attributeSortedVector/samples/Example8-SortedVectors

Includes example: https://github.com/jamescourtney/FlatSharp/tree/copyCtors-attributeSortedVector/samples/Example7-Includes

jamescourtney commented 4 years ago

This should be fixed in version 3.0, which has been published just now. Feel free to reactivate if I've broken something else :)