jamescourtney / FlatSharp

Fast, idiomatic C# implementation of Flatbuffers
Apache License 2.0
510 stars 51 forks source link

Offset Model (8-bit, 16-bit, 64-bit offset sizes, file identifier sizes) #207

Closed TYoungSL closed 2 years ago

TYoungSL commented 3 years ago

Implements a new component for type models, OffsetModel to allow varying offset and file identifier sizes.

Bumps Microsoft.CodeAnalysis.CSharp.Workspaces and Microsoft.Net.Compilers versions from 3.8.0 to 3.11.0.

VTable offset sizes are not parameterized yet.

Messages can be extremely short when 16 and 8-bit offset sizes are specified.

Messages can (obviously) contain a lot more data when 64-bit offset sizes are specified.

Implemented by compiler options, --offset-size, --file-identifier-size and --strict-file-identifier-size.

This will be tested for parity against flatcc built with non-standard offset sizes. (https://github.com/dvidelabs/flatcc/pull/206)

StirlingLabs have a fork of Google's FlatBuffers called BigBuffers which also aims for parity with flatcc with 64-bit offset sizes.

codecov[bot] commented 3 years ago

Codecov Report

Merging #207 (5646f1a) into master (51e0f13) will decrease coverage by 0.00%. The diff coverage is 96.26%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #207      +/-   ##
==========================================
- Coverage   94.68%   94.68%   -0.01%     
==========================================
  Files         104      105       +1     
  Lines        7584     7635      +51     
  Branches      677      681       +4     
==========================================
+ Hits         7181     7229      +48     
- Misses        300      303       +3     
  Partials      103      103              
Impacted Files Coverage Δ
src/FlatSharp/OffsetModel.cs 91.66% <91.66%> (ø)
src/FlatSharp/TypeModel/TypeModelContainer.cs 86.20% <91.66%> (+1.20%) :arrow_up:
src/FlatSharp.Compiler/FlatSharpCompiler.cs 95.20% <100.00%> (+0.40%) :arrow_up:
...ompiler/TypeDefinitions/TableOrStructDefinition.cs 94.53% <100.00%> (+0.02%) :arrow_up:
src/FlatSharp/FlatBufferSerializer.cs 91.95% <100.00%> (ø)
src/FlatSharp/FlatBufferSerializerOptions.cs 85.71% <100.00%> (+1.71%) :arrow_up:
src/FlatSharp/TypeModel/EnumTypeModel.cs 91.95% <100.00%> (ø)
src/FlatSharp/TypeModel/ItemMemberModel.cs 90.54% <100.00%> (+0.12%) :arrow_up:
src/FlatSharp/TypeModel/NullableTypeModel.cs 98.50% <100.00%> (ø)
src/FlatSharp/TypeModel/RuntimeTypeModel.cs 84.00% <100.00%> (+3.04%) :arrow_up:
... and 12 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 51e0f13...5646f1a. Read the comment docs.

jamescourtney commented 3 years ago

Hey -- Now that I have v6 done, I'm interested in taking a look at this. Thanks for the suggestions here. There is a lot to think about with this change.

There a few things that stick out to me initially:

I've seen your BigSpan<T> repo, and while it seems useful, it's also a lot of unsafe code that hasn't been audited. If you look at the readme for this repo, the very first goal is: "Full safety (no unsafe code or IL generation -- more on that below)." Obviously, the CLR team and .NET runtime folks use a lot of unsafe code upon which FlatSharp depends, but that's table stakes for a C# project. That's not to say I'm opposed to using BigSpan, but it must be optional (ie, FlatSharp.Runtime cannot depend on it). The first thing that comes to my mind is defining some IOutputBuffer struct, but I don't think that will work since ref structs can't implement interfaces (to avoid boxing).

So, I'm totally game for figuring out what the bigger picture here looks like, but this will be a pretty fundamental change and I want to make sure we do due diligence up front.

jamescourtney commented 3 years ago

And finally -- what does this look like if/when flatc starts supporting big offsets in a year and they look different than what we prototype?

You might also be interested in this: https://github.com/dotnet/apireviews/tree/main/2016/11-04-SpanOfT#spant-and-64-bit

TYoungSL commented 3 years ago

BigSpans is an implementation of option 3 in that API review.

It's the .NET Span runtime library code made to work with (u)longs.

There's no good way around not using it (or a reimplementation of it) if you need to address a 64-bit address range.

If flatc implements something different at the binary formatting, whether the offsets stay 32-bit, become 64-bit, or are implemented as variable-sized ints, unless it becomes just many 32-bit chunks, the rearchitecting to support 64-bit data will need be done regardless.

making vectors support > 2^32 items

Yeah, we'll have some of that.

Should the setting be per-table or per-compilation?

Per-compilation is easier. Per-table can come later as an extension or otherwise.

All of the places I've implicitly assumed that a uoffset / soffset is 4 bytes, or that a vtable offset is 2 bytes will need to be tracked down. Hopefully it isn't too many, but it's something to be careful about.

That's what's holding us up with porting flatc in BigBuffers but it seems to be working, we're testing output versus flatcc which somewhat already supported variable sized offsets. There's still plenty of 2 * vtable offset size assumption in the codegen, this PR only addresses offset sizes. The tests need to all be updated to specify a range of offset sizes, as in StructVectorTests.cs.

jamescourtney commented 3 years ago

Thanks for the thoughts. I've been mulling this over today. Based on what I've seen, the flatc folks don't seem terribly eager to head down this path, but it looks like the flatcc devs are open to it.

It seems like there's an opportunity here to do more than just "64 bit flatbuffers". I think if we're going to diverge from the binary format of FlatBuffers, we shouldn't do it halfway. There's quite a bit of mindshare and expertise in this area between you, me, and the folks from flatcc. My suggestion is to take this opportunity to actually fork the FlatBuffer format and use it as the basis for some new, non-backwards-compatible format that looks a bit like FlatBuffers. The bonus is that we're not just bolting on big offsets to the current format, but we can actually take some time and address some shortcomings of it:

This is, obviously, a big endeavor, but much of the code from FlatSharp, flatcc, flatc will be reusable. I'm interested in collaborating on a whitepaper for some new specification, and seeing if something compelling pops out of that discussion.

jamescourtney commented 3 years ago

Given the foundational changes we're discussing here, I think I want to do this outside of the core FlatSharp repo. FlatSharp is at this point pretty mature and stable, and I'd like it to remain FlatBuffer focused and have no dependencies outside of Roslyn / .NET standard library. I also don't know what the future looks like for FlatSharp; there aren't any immediate "things I want to add" on my list (which has happened before). That's not to say I consider it "finished" or have plans to abandon it. It's more an expression of the fact that FlatSharp is at feature parity with flatc and I don't think there is much performance left to squeeze out of it, given that I've spent the last few years turning that crank (though I would be delighted to be proven wrong!)

Anyway, my thoughts in summary are that...

Rjvs commented 3 years ago

This all sounds reasonable. Happy to change the name of BigBuffers to describe the project better.

Rjvs commented 3 years ago

You might be interested in our changelist at https://github.com/StirlingLabs/BigBuffers/blob/main/docs/source/DifferencesFromFlat.md

@TYoungSL made alignment optional without harming compatibility against FlatBuffers but it would be better to have optional alignment of tables in the spec. We actually need alignment in many of our tables for SIMD but some users wouldn't have that requirement.