Closed jamescourtney closed 3 years ago
Alternatively, it may be possible to generate a "wrapper" struct that supports the writethrough metadata. This allows the "real" struct to continue to have the correct memory layout so it can get the benefits of MemoryMarshal
serialization and deserialization, as well as support P/Invoke interop.
Imagine Vec3:
struct Vec3 (fs_writeThrough)
{
X : float;
Y : float;
Z : float;
}
FlatSharp could generate this as:
[FlatBufferStruct, StructLayout(LayoutKind.Explicit)]
public struct Vec3
{
// First 12 bytes are reserved for the FlatBuffer memory layout.
[FieldOffset(0)] public float X;
[FieldOffset(4)] public float Y;
[FieldOffset(8)] public float Z;
}
public struct Vec3_WriteThrough
{
private IInputBuffer? buffer;
private int offset;
public Vec3 Vec3 { get; }
public int X
{
get => this.Vec3.X;
set
{
this.Vec3.X = value;
if (this.buffer is not null)
{
default(SpanWriter).WriteFloat(value, offset, buffer.GetMemory());
}
}
}
}
This approach is more flexible insofar as it doesn't require a bunch of special rules anywhere outside the struct code. Tables and vectors no longer need any special logic.
WriteThrough may fail if the item in the table is null. Should writeThrough in tables imply fs_forceWrite and required? This would guarantee that the writethrough element always had a spot in the buffer so writes could succeed.'
Could issue a warning that without fs_forceWrite
or required
also specified, the field may throw a NullReferenceException (or a FlatSharp specific exception) if it is absent.
If the table's vtable struct field offset is zero or trimmed, manually throw instead of returning. In the case it's defaulted, that's still a problem (fs_forceWrite
would solve).
Should be less work to throw.
In most cases (like Vec3) the field should simply be required, though an example of usage we've come across would be an optional Matrix4x4
. We ended up marking it required anyway but there are possibilities that it could be absent or defaulted. All of our Vector3-alikes tend to be in vectors themselves.
You can't actually imply required
because another fbs compiler wouldn't have the same rule, you'd have to just yell at the user to add it.
Good thoughts, thanks @TYoungSL!
I have some shower thoughts of my own. Writing them down before bed. Pardon the stream of consciousness here.
Quick review of how writethrough works with reference structs:
// You define this. Either via FBS file or via C#
[FlatBufferStruct]
public class SomeStruct
{
[FlatBufferItem(0, WriteThrough = true)] public virtual int Foo { get; set; }
}
// Flatsharp builds this
public sealed class StructParser_abcd<TInputBuffer> : SomeStruct
where TInputBuffer : IInputBuffer
{
private IInputBuffer inputBuffer;
private int offset;
public override int Foo
{
get => this._foo;
set
{
this._foo = value;
WriteInt(value, inputBuffer, offset);
}
}
}
This is a useful design that works because subclassing allows FlatSharp to add extra information to the deserialized object to enable writethrough semantics that are transparent to the user. The user never sees the StructParser_abcd
class. The other nice aspect of the design is that tables/vectors/unions don't treat a struct any differently if it is writethrough-enabled. It is self-contained and handles its own semantics. The reason I present this here is just to illustrate why we have to be creative with value structs.
Challenges/constraints:
LayoutKind.Explicit
cannot have reference types embedded in them, which means no IInputBuffer
. They also cannot be generic.LayoutKind.Explicit
buys a lot in terms of performance, especially with larger structs and struct vectors. We need to keep it.I think the model I want to use is something approaching this:
struct Foo
{
A : int;
}
Should generate:
[FlatBufferVirtualStruct(typeof(Foo.Foo_Data))] // New attribute: Virtual structs are just convenience wrappers around the real data.
public struct Foo
{
private IInputBuffer? inputBuffer;
private int offset;
// virtual structs require this constructor. Compiler will generate this and some others.
public Foo(Foo_Data data, IInputBuffer? buffer, int offset)
{
...
}
// Allow access to underlying data in case P/Invoke or something fancy is required. Writethrough of course won't
// work for this, but it's a value type, so hopefuly user doesn't have any expectation that it would work.
public Foo_Data UnderlyingData { get; }
public int A
{
get => this.Data.A;
set
{
this.Data.A = value;
WriteA();
}
}
[FlatBufferStruct]
[StructLayout(LayoutKind.Explicit)]
public struct Foo_Data
{
[FieldOffset(0)] public int A;
}
}
I think this keeps the broad strokes of what I like about value structs.
There's a lot of detail still to iron out here, such as...
Lazy
and VectorCacheMutable
, because it can guarantee that this code works:var someTable = FlatBufferSerializer.Default.Parse<T>(buffer);
var structA = someTable.A;
var alsoStructA = someTable.A;
structA.Foo = 3;
Debug.Assert(alsoStructA.Foo == 3);
This works in Lazy
, because the Foo
accessor reads from the underlying buffer each time, so you may have a bunch of different objects floating around, but they are all on top of the same buffer. With VectorCacheMutable
, we get the guarantee that a single spot in the buffer is deserialized at most once, so structA
and alsoStructA
are the same reference, so it works there too.
These guarantees don't hold up with value structs, since they are copied by value. So even if FlatSharp only parses it once, each time it is accessed a different copy is made, which means that if one copy changes the value, there is no "notification" mechanism to the other copies. Even worse, future reads from the containing table/vector/union would return stale data, since (again) -- the copy is the only one with the changed data:
var someTable = FlatBufferSerializer.Default.Parse<T>(buffer);
var structA = someTable.A;
var alsoStructA = someTable.A;
structA.Foo = 3; // saved to my struct and back to the buffer.
Debug.Assert(alsoStructA.Foo == 3); // will fail. Value type is copied.
var thirdStructA = someTable.A;
Debug.Assert(thirdStructA.Foo == 3); // Even worse, this will fail too, since the "A" instance may be cached in the table.
Of course, this is already possible with value structs being mutable. But they have to be because LayoutKind.Explicit
requires fields. One thing I really strive for in FlatSharp is to obey the principle of least astonishment. As a user (especially one who might not be familiar with the semantics of value types in C#), I would be very surprised at this behavior.
Scratch that whole post. Had a much better idea that I think will work without being too difficult to implement.
To update on this, the "much better idea" referenced above is not so. I was originally thinking about buffer-backed structs:
public struct Foo
{
private Memory<byte> buffer;
public int A
{
get => BinaryPrimitives.ReadInt32LittleEndian(buffer.Span.Slice(0, 4));
}
}
However, this approach is simply not performant. It's also been mentioned that people find a lot of value in the LayoutKind.Explicit
implementation that currently exists.
I've run some benchmarks where writethrough was previously necessary to increase performance. Here's the original voxel bench, courtesy of @Astn:
FlatSharp 5.3-ish, before WriteThrough: | Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|
SendRegionToClient | 1.568 ms | 0.0171 ms | 0.0009 ms | 121.0938 | - | - | 1.95 MB | |
SendVisibleRegionsToClient | 596.928 ms | 13.4940 ms | 0.7397 ms | 48000.0000 | - | - | 781.33 MB | |
SendVisibleMeshesToClient | 646.808 ms | 33.7545 ms | 1.8502 ms | 48000.0000 | - | - | 781.33 MB | |
SendMeshToClient | 1.347 ms | 0.0434 ms | 0.0024 ms | 232.4219 | - | - | 3.71 MB | |
ModifyMeshAndSendToClients | 29.824 ms | 6.7302 ms | 0.3689 ms | 1093.7500 | 656.2500 | 250.0000 | 19.93 MB |
This is what it looked like after modifying the benchmark to use writethrough for reference structs: | Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|
SendRegionToClient | 4.633 us | 0.3981 us | 0.0218 us | 0.0076 | - | - | 152 B | |
SendVisibleRegionsToClient | 1,698.057 us | 5.1669 us | 0.2832 us | 1.9531 | - | - | 60800 B | |
SendVisibleMeshesToClient | 1,702.729 us | 1,153.4816 us | 63.2262 us | 1.9531 | - | - | 60800 B | |
SendMeshToClient | 112.733 us | 0.3960 us | 0.0217 us | - | - | - | 176 B | |
ModifyMeshAndSendToClients | 11,960.689 us | 259.8482 us | 14.2432 us | 1234.3750 | - | - | 20675816 B |
And here's what it looks like with Value Structs: | Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|
SendRegionToClient | 3.972 us | 0.1471 us | 0.0081 us | 0.0076 | - | - | 168 B | |
SendVisibleRegionsToClient | 1,566.960 us | 41.4993 us | 2.2747 us | 3.9063 | - | - | 67201 B | |
SendVisibleMeshesToClient | 2,099.516 us | 57.5996 us | 3.1572 us | 3.9063 | - | - | 67201 B | |
SendMeshToClient | 17.133 us | 0.9626 us | 0.0528 us | - | - | - | 168 B | |
ModifyMeshAndSendToClients | 6,885.991 us | 766.1981 us | 41.9979 us | 226.5625 | 23.4375 | 15.6250 | 7896804 B |
This is a long-winded way of saying: I'm not sure how necessary WriteThrough is for value structs.
I haven't looked through the benchmark... is write through with value structs on a new branch?
I could maybe add a scenario that might benefit from write through to the benchmark, but also not too sure if it'd be any more sufficiently beneficial to use write through of individual structs, just vectors of structs or large fixed-length arrays in structs.
Is this where these benchmark results came from? https://github.com/jamescourtney/FlatSharp/commits/voxelBenchRedux
Is this where these benchmark results came from?
Yep!
Sure -- let's do a proof of concept for it. How I usually do things in FlatSharp is:
The only viable idea I have for this at the moment is an interface along the lines of:
// A better name on this one...
public interface ITableWithValueStruct : IFlatBufferDeserializedObject
{
// Maybe this?
(int offset, int length)? PositionOfStruct(string propertyName, int? index = null);
// Or this, to help the JIT elide some string comparisons.
(int offset, int length)? PositionOfStruct<TStruct>(string propertyName, int? index = null) where TStruct : struct;
// Or this
void SaveStruct<TStruct>(string propertyName, TStruct value, int? index = null) where TStruct : struct;
}
The reason I am thinking along these lines is that...
LayoutKind.Explicit
approach from current value structs, or tunneling any extra metadata around inside of themIFlatBufferDeserializedObject
already exists today, and is only implemented by objects that FlatSharp has deserialized (ie, subclasses of the "primary" types).There are also a lot of open questions, such as:
Some more rambling thoughts on this rolling around in my head. This is starting to smell a bit like another design limitation FlatSharp has. I'm wondering if it's time to tackle it.
FlatSharp's codegen is based on the idea that there is only one way to serialize or parse a given Type
. This is why we ended up with SharedString
as a separate type, because that was the best way to differentiate the intent on the serialization and parse paths. There's one method to write a shared string and another to write a regular string. Not too bad.
Write through is a similar problem. I might want something that looks like this:
table Table
{
// Expectation is that I should be able to assign to the vector have those changes flow through.
V1 : [Struct] (fs_writeThrough);
V2 : [Struct];
}
FlatSharp can't support this today, because the types are the same, and FlatSharp can't distinguish. But now that I'm working on 6.0, I'm open to fundamental changes in this area if there's a good way to model it.
Right now, serializers in FlatSharp have roughly this signature:
private static void WriteInlineValueOf<TSpanWriter>(
TSpanWriter spanWriter,
Span<byte> span,
Table value,
int offset
, SerializationContext context) where TSpanWriter : ISpanWriter
Though sometimes the SerializationContext
argument is omitted. One thing we can think of doing is adding a ContextualOptions
flag:
[Flags]
public enum ContextualTableFieldOptions : byte
{
None = 0,
SharedString = 1,
WriteThrough = 2,
}
This could be another optional argument that is only used when the target Type
is potentially usable with one of those scenarios. In this case, the cases would be:
These contexts wouldn't need to flow through the whole object graph -- only until you hit another table, which will reset the context. The maximum "depth" you can have between tables is really one level of indirection. Vectors and Unions are the only "open generics" in the schema.
An altogether different option would be to specify writethrough at the serializer level, but that doesn't solve the Shared Strings problem.
The shared strings prototype is complete, and we now have contextual options that can "flow" through the serialization calls until they hit another table. PR: #225
That will be the foundation for write-through value structs. We'll support write through for structs on tables provided the item is marked as required
and the serializer is mutable.
In vectors, writethrough will be supported provided that the serializer is mutable and the item is fixed size (though it might be disallowed for sorted vectors?)
The syntax will look something like:
table Foo
{
I : MutableInt (fs_writeThrough, required);
V : [ MutableInt ] (fs_writeThrough);
}
struct MutableInt (fs_valueStruct)
{
Value : Int;
}
PR is here #228. It's pretty rough at the moment and I need to go through and write tons of tests. But it works! There's even one test case!
attribute "fs_serializer";
attribute "fs_valueStruct";
attribute "fs_writeThrough";
namespace FlatSharpEndToEndTests.ValueStructs;
table WriteThroughTable (fs_serializer:"Lazy")
{
Points : [ Vec3 ] (fs_writeThrough);
}
struct Vec3 (fs_valueStruct)
{
X : float;
Y : float;
Z : float;
}
This test passes:
[Fact]
public void WriteThrough_ValueStruct_InVector()
{
WriteThroughTable t = new WriteThroughTable
{
Points = new Vec3[]
{
new() { X = 1, Y = 2, Z = 3 },
new() { X = 4, Y = 5, Z = 6 },
}
};
byte[] data = new byte[1024];
WriteThroughTable.Serializer.Write(data, t);
var parsed = WriteThroughTable.Serializer.Parse(data);
var parsed2 = WriteThroughTable.Serializer.Parse(data);
Assert.Equal(1f, parsed2.Points[0].X);
parsed.Points[0] = new Vec3 { X = -1, Y = -1, Z = -1 }; // triggers writethrough
Assert.Equal(-1f, parsed2.Points[0].X);
}
The nuance here is that since the serializer is Lazy
and that both parsed
and parsed2
are sitting on top of the same byte array, we can observe the writethrough from parsed
in parsed2
since it re-reads from the source buffer each time. Things become a little murky if you use Progressive
, which causes cache-on-read behavior (though it's only a problem if you have two objects on the same array).
Preview here under artifacts: https://github.com/jamescourtney/FlatSharp/actions/runs/1227101064
Checked into flatSharp6 branch. #228
With the changes in 5.7 to allow value-type structs, it now seems important to enable writethrough for these scenarios. Currently, WriteThrough is restricted to reference type structs. This issue tracks supporting writethrough at the table level for struct members:
There are some open questions:
WriteThrough may fail if the item in the table is null. Should writeThrough in tables imply
fs_forceWrite
andrequired
? This would guarantee that the writethrough element always had a spot in the buffer so writes could succeed.'Should we support setting the value of a struct that exists to
null
. This would require vtable updates and would make things odd:Assert(table.Value is not null);
table.Value = null; // clears vtable and orphans a chunk of the buffer
table.Value = new Struct(); // no spot in the buffer for this now -- we don't know about that orphaned chunk any longer