jamescourtney commented 3 years ago

With the changes in 5.7 to allow value-type structs, it now seems important to enable writethrough for these scenarios. Currently, WriteThrough is restricted to reference type structs. This issue tracks supporting writethrough at the table level for struct members:

table SomeTable (fs_writeThrough) // not allowed -- writethrough in tables need to be case-by-case.
{
    RefStruct : SomeReferenceStruct (fs_writeThrough); // OK
    ValueStruct : SomeValueStruct (fs_writeThrough); // also OK
    Str : string (fs_writeThrough); // Not OK -- strings are by-reference.
    Integer : int (fs_writeThrough); // OK
    OtherTable : OtherTable (fs_writeThrough); // Not OK -- tables are by-reference.

    // might be hard. Won't work for arrays of value structs. Only supported for List vectors.
    StructVector : [ValueStruct] (fs_writeThrough); 
}

There are some open questions:

WriteThrough may fail if the item in the table is null. Should writeThrough in tables imply fs_forceWrite and required? This would guarantee that the writethrough element always had a spot in the buffer so writes could succeed.'
Should we support setting the value of a struct that exists to null. This would require vtable updates and would make things odd:
- Assert(table.Value is not null);
- table.Value = null; // clears vtable and orphans a chunk of the buffer
- table.Value = new Struct(); // no spot in the buffer for this now -- we don't know about that orphaned chunk any longer

jamescourtney commented 3 years ago

Alternatively, it may be possible to generate a "wrapper" struct that supports the writethrough metadata. This allows the "real" struct to continue to have the correct memory layout so it can get the benefits of MemoryMarshal serialization and deserialization, as well as support P/Invoke interop.

Imagine Vec3:

struct Vec3 (fs_writeThrough)
{
     X : float;
     Y : float;
     Z : float;
}

FlatSharp could generate this as:

[FlatBufferStruct, StructLayout(LayoutKind.Explicit)]
public struct Vec3
{
     // First 12 bytes are reserved for the FlatBuffer memory layout.
     [FieldOffset(0)] public float X;
     [FieldOffset(4)] public float Y;
     [FieldOffset(8)] public float Z;
}

public struct Vec3_WriteThrough
{
    private IInputBuffer? buffer;
    private int offset;

    public Vec3 Vec3 { get; }

    public int X
    {
        get => this.Vec3.X;
        set
        {
             this.Vec3.X = value;
             if (this.buffer is not null)
             {
                    default(SpanWriter).WriteFloat(value, offset, buffer.GetMemory());
              }
        }
    }
}

This approach is more flexible insofar as it doesn't require a bunch of special rules anywhere outside the struct code. Tables and vectors no longer need any special logic.

TYoungSL commented 3 years ago

WriteThrough may fail if the item in the table is null. Should writeThrough in tables imply fs_forceWrite and required? This would guarantee that the writethrough element always had a spot in the buffer so writes could succeed.'

Could issue a warning that without fs_forceWrite or required also specified, the field may throw a NullReferenceException (or a FlatSharp specific exception) if it is absent.

If the table's vtable struct field offset is zero or trimmed, manually throw instead of returning. In the case it's defaulted, that's still a problem (fs_forceWrite would solve).

Should be less work to throw. In most cases (like Vec3) the field should simply be required, though an example of usage we've come across would be an optional Matrix4x4. We ended up marking it required anyway but there are possibilities that it could be absent or defaulted. All of our Vector3-alikes tend to be in vectors themselves.

You can't actually imply required because another fbs compiler wouldn't have the same rule, you'd have to just yell at the user to add it.

jamescourtney commented 3 years ago

Good thoughts, thanks @TYoungSL!

I have some shower thoughts of my own. Writing them down before bed. Pardon the stream of consciousness here.

Quick review of how writethrough works with reference structs:

// You define this. Either via FBS file or via C#
[FlatBufferStruct]
public class SomeStruct
{
    [FlatBufferItem(0, WriteThrough = true)] public virtual int Foo { get; set; }
}

// Flatsharp builds this
public sealed class StructParser_abcd<TInputBuffer> : SomeStruct
    where TInputBuffer : IInputBuffer
{
      private IInputBuffer inputBuffer;
      private int offset;

      public override int Foo
      {
            get => this._foo;
            set
            {
                  this._foo = value;
                  WriteInt(value, inputBuffer, offset);
            }
       }
}

This is a useful design that works because subclassing allows FlatSharp to add extra information to the deserialized object to enable writethrough semantics that are transparent to the user. The user never sees the StructParser_abcd class. The other nice aspect of the design is that tables/vectors/unions don't treat a struct any differently if it is writethrough-enabled. It is self-contained and handles its own semantics. The reason I present this here is just to illustrate why we have to be creative with value structs.

Challenges/constraints:

No inheritance
Structs with LayoutKind.Explicit cannot have reference types embedded in them, which means no IInputBuffer. They also cannot be generic.
LayoutKind.Explicit buys a lot in terms of performance, especially with larger structs and struct vectors. We need to keep it.
Keep semantics of containers not caring if a value struct is writethrough enabled. I want to avoid turning the code into soup.

I think the model I want to use is something approaching this:

struct Foo
{
    A : int;
}

Should generate:

[FlatBufferVirtualStruct(typeof(Foo.Foo_Data))] // New attribute: Virtual structs are just convenience wrappers around the real data.
public struct Foo
{
      private IInputBuffer? inputBuffer;
      private int offset;

      // virtual structs require this constructor. Compiler will generate this and some others.
      public Foo(Foo_Data data, IInputBuffer? buffer, int offset)
      {
             ...
      }

      // Allow access to underlying data in case P/Invoke or something fancy is required. Writethrough of course won't
      // work for this, but it's a value type, so hopefuly user doesn't have any expectation that it would work.
      public Foo_Data UnderlyingData { get; }

      public int A
      {
            get => this.Data.A;
            set
            {
                this.Data.A = value;
                WriteA();
            }
      }

      [FlatBufferStruct]
      [StructLayout(LayoutKind.Explicit)]
      public struct Foo_Data
      {
            [FieldOffset(0)] public int A;
      }
}

I think this keeps the broad strokes of what I like about value structs.

There's a lot of detail still to iron out here, such as...

How do nested structs work? What about nested structs with writethrough?
How does this play with various serialization modes? Today, reference writethrough only works with Lazy and VectorCacheMutable, because it can guarantee that this code works:

var someTable = FlatBufferSerializer.Default.Parse<T>(buffer);
var structA = someTable.A;
var alsoStructA = someTable.A;

structA.Foo = 3;

Debug.Assert(alsoStructA.Foo == 3);

This works in Lazy, because the Foo accessor reads from the underlying buffer each time, so you may have a bunch of different objects floating around, but they are all on top of the same buffer. With VectorCacheMutable, we get the guarantee that a single spot in the buffer is deserialized at most once, so structA and alsoStructA are the same reference, so it works there too.

These guarantees don't hold up with value structs, since they are copied by value. So even if FlatSharp only parses it once, each time it is accessed a different copy is made, which means that if one copy changes the value, there is no "notification" mechanism to the other copies. Even worse, future reads from the containing table/vector/union would return stale data, since (again) -- the copy is the only one with the changed data:

var someTable = FlatBufferSerializer.Default.Parse<T>(buffer);
var structA = someTable.A; 
var alsoStructA = someTable.A;

structA.Foo = 3; // saved to my struct and back to the buffer.

Debug.Assert(alsoStructA.Foo == 3); // will fail. Value type is copied.

var thirdStructA = someTable.A;
Debug.Assert(thirdStructA.Foo == 3); // Even worse, this will fail too, since the "A" instance may be cached in the table.

Of course, this is already possible with value structs being mutable. But they have to be because LayoutKind.Explicit requires fields. One thing I really strive for in FlatSharp is to obey the principle of least astonishment. As a user (especially one who might not be familiar with the semantics of value types in C#), I would be very surprised at this behavior.

jamescourtney commented 3 years ago

Scratch that whole post. Had a much better idea that I think will work without being too difficult to implement.

jamescourtney commented 3 years ago

To update on this, the "much better idea" referenced above is not so. I was originally thinking about buffer-backed structs:

public struct Foo
{
     private Memory<byte> buffer;

     public int A
     {
         get => BinaryPrimitives.ReadInt32LittleEndian(buffer.Span.Slice(0, 4));
     }
}

However, this approach is simply not performant. It's also been mentioned that people find a lot of value in the LayoutKind.Explicit implementation that currently exists.

I've run some benchmarks where writethrough was previously necessary to increase performance. Here's the original voxel bench, courtesy of @Astn:

FlatSharp 5.3-ish, before WriteThrough:	Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2
SendRegionToClient	1.568 ms	0.0171 ms	0.0009 ms	121.0938	-	-	1.95 MB
SendVisibleRegionsToClient	596.928 ms	13.4940 ms	0.7397 ms	48000.0000	-	-	781.33 MB
SendVisibleMeshesToClient	646.808 ms	33.7545 ms	1.8502 ms	48000.0000	-	-	781.33 MB
SendMeshToClient	1.347 ms	0.0434 ms	0.0024 ms	232.4219	-	-	3.71 MB
ModifyMeshAndSendToClients	29.824 ms	6.7302 ms	0.3689 ms	1093.7500	656.2500	250.0000	19.93 MB

This is what it looked like after modifying the benchmark to use writethrough for reference structs:	Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2
SendRegionToClient	4.633 us	0.3981 us	0.0218 us	0.0076	-	-	152 B
SendVisibleRegionsToClient	1,698.057 us	5.1669 us	0.2832 us	1.9531	-	-	60800 B
SendVisibleMeshesToClient	1,702.729 us	1,153.4816 us	63.2262 us	1.9531	-	-	60800 B
SendMeshToClient	112.733 us	0.3960 us	0.0217 us	-	-	-	176 B
ModifyMeshAndSendToClients	11,960.689 us	259.8482 us	14.2432 us	1234.3750	-	-	20675816 B

And here's what it looks like with Value Structs:	Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2
SendRegionToClient	3.972 us	0.1471 us	0.0081 us	0.0076	-	-	168 B
SendVisibleRegionsToClient	1,566.960 us	41.4993 us	2.2747 us	3.9063	-	-	67201 B
SendVisibleMeshesToClient	2,099.516 us	57.5996 us	3.1572 us	3.9063	-	-	67201 B
SendMeshToClient	17.133 us	0.9626 us	0.0528 us	-	-	-	168 B
ModifyMeshAndSendToClients	6,885.991 us	766.1981 us	41.9979 us	226.5625	23.4375	15.6250	7896804 B

This is a long-winded way of saying: I'm not sure how necessary WriteThrough is for value structs.

TYoungSL commented 3 years ago

I haven't looked through the benchmark... is write through with value structs on a new branch?

I could maybe add a scenario that might benefit from write through to the benchmark, but also not too sure if it'd be any more sufficiently beneficial to use write through of individual structs, just vectors of structs or large fixed-length arrays in structs.

TYoungSL commented 3 years ago

Is this where these benchmark results came from? https://github.com/jamescourtney/FlatSharp/commits/voxelBenchRedux

jamescourtney commented 3 years ago

Is this where these benchmark results came from?

Yep!

Sure -- let's do a proof of concept for it. How I usually do things in FlatSharp is:

Think of something (we are here)
Write the code that I want FlatSharp to create. This is often done by taking something that FlatSharp already created and just extending it with some hand-tuned code.
Test that code, make sure it works and is performant
Do the work to generate that code, think through all permutations, etc
Test and ship

The only viable idea I have for this at the moment is an interface along the lines of:

// A better name on this one...
public interface ITableWithValueStruct : IFlatBufferDeserializedObject
{
       // Maybe this?
       (int offset, int length)? PositionOfStruct(string propertyName, int? index = null);

       // Or this, to help the JIT elide some string comparisons.
       (int offset, int length)? PositionOfStruct<TStruct>(string propertyName, int? index = null) where TStruct : struct;

       // Or this
       void SaveStruct<TStruct>(string propertyName, TStruct value, int? index = null) where TStruct : struct;
}

The reason I am thinking along these lines is that...

It's kind of specialized as an advanced API, off it its own corner. FlatSharp tries to be very friendly and unsurprising in its default configuration. But I also don't want to prevent people who need things like this from getting them.
It doesn't require changing the LayoutKind.Explicit approach from current value structs, or tunneling any extra metadata around inside of them
Tables will continue to be reference types only, so adding an interface and a type check there is actually really cheap. Flatsharp can then generate the code necessary to implement those methods. IFlatBufferDeserializedObject already exists today, and is only implemented by objects that FlatSharp has deserialized (ie, subclasses of the "primary" types).

There are also a lot of open questions, such as:

Might this be better as a more thorough reflection API? That would sure make the FlatSharp unit tests easier :)
What about Unions? (Or vectors of unions)
Others will occur to me no doubt as we go along.

jamescourtney commented 3 years ago

Some more rambling thoughts on this rolling around in my head. This is starting to smell a bit like another design limitation FlatSharp has. I'm wondering if it's time to tackle it.

FlatSharp's codegen is based on the idea that there is only one way to serialize or parse a given Type. This is why we ended up with SharedString as a separate type, because that was the best way to differentiate the intent on the serialization and parse paths. There's one method to write a shared string and another to write a regular string. Not too bad.

Write through is a similar problem. I might want something that looks like this:

table Table
{  
     // Expectation is that I should be able to assign to the vector have those changes flow through.
      V1 : [Struct] (fs_writeThrough);
      V2 : [Struct];
}

FlatSharp can't support this today, because the types are the same, and FlatSharp can't distinguish. But now that I'm working on 6.0, I'm open to fundamental changes in this area if there's a good way to model it.

Right now, serializers in FlatSharp have roughly this signature:

private static void WriteInlineValueOf<TSpanWriter>(
    TSpanWriter spanWriter,
    Span<byte> span,
    Table value,
    int offset
    , SerializationContext context) where TSpanWriter : ISpanWriter

Though sometimes the SerializationContext argument is omitted. One thing we can think of doing is adding a ContextualOptions flag:

[Flags]
public enum ContextualTableFieldOptions : byte
{
       None = 0,
       SharedString = 1,
       WriteThrough = 2,
}

This could be another optional argument that is only used when the target Type is potentially usable with one of those scenarios. In this case, the cases would be:

Writing a string
Writing a vector of strings (so the vector method can tunnel that down to the string method)
Reading a vector of structs (for writethrough).

These contexts wouldn't need to flow through the whole object graph -- only until you hit another table, which will reset the context. The maximum "depth" you can have between tables is really one level of indirection. Vectors and Unions are the only "open generics" in the schema.

An altogether different option would be to specify writethrough at the serializer level, but that doesn't solve the Shared Strings problem.

jamescourtney commented 3 years ago

224

jamescourtney commented 3 years ago

The shared strings prototype is complete, and we now have contextual options that can "flow" through the serialization calls until they hit another table. PR: #225

That will be the foundation for write-through value structs. We'll support write through for structs on tables provided the item is marked as required and the serializer is mutable.

In vectors, writethrough will be supported provided that the serializer is mutable and the item is fixed size (though it might be disallowed for sorted vectors?)

The syntax will look something like:


table Foo
{
      I : MutableInt (fs_writeThrough, required);
      V : [ MutableInt ] (fs_writeThrough);
}

struct MutableInt (fs_valueStruct)
{
        Value : Int;
}

jamescourtney commented 3 years ago

PR is here #228. It's pretty rough at the moment and I need to go through and write tons of tests. But it works! There's even one test case!

attribute "fs_serializer";
attribute "fs_valueStruct";
attribute "fs_writeThrough";

namespace FlatSharpEndToEndTests.ValueStructs;
table WriteThroughTable (fs_serializer:"Lazy")
{
    Points : [ Vec3 ] (fs_writeThrough);
}

struct Vec3 (fs_valueStruct)
{
    X : float;
    Y : float;
    Z : float;
}

This test passes:

[Fact]
public void WriteThrough_ValueStruct_InVector()
{
    WriteThroughTable t = new WriteThroughTable
    {
        Points = new Vec3[]
        {
            new() { X = 1, Y = 2, Z = 3 },
            new() { X = 4, Y = 5, Z = 6 },
        }
    };

    byte[] data = new byte[1024];
    WriteThroughTable.Serializer.Write(data, t);

    var parsed = WriteThroughTable.Serializer.Parse(data);
    var parsed2 = WriteThroughTable.Serializer.Parse(data);

    Assert.Equal(1f, parsed2.Points[0].X);

    parsed.Points[0] = new Vec3 { X = -1, Y = -1, Z = -1 }; // triggers writethrough

    Assert.Equal(-1f, parsed2.Points[0].X);
}

The nuance here is that since the serializer is Lazy and that both parsed and parsed2 are sitting on top of the same byte array, we can observe the writethrough from parsed in parsed2 since it re-reads from the source buffer each time. Things become a little murky if you use Progressive, which causes cache-on-read behavior (though it's only a problem if you have two objects on the same array).

jamescourtney commented 3 years ago

Preview here under artifacts: https://github.com/jamescourtney/FlatSharp/actions/runs/1227101064

jamescourtney commented 3 years ago

Checked into flatSharp6 branch. #228

jamescourtney / FlatSharp

Support WriteThrough on Value Struct #201

224