Closed TYoungSL closed 2 years ago
Merging #207 (5646f1a) into master (51e0f13) will decrease coverage by
0.00%
. The diff coverage is96.26%
.
@@ Coverage Diff @@
## master #207 +/- ##
==========================================
- Coverage 94.68% 94.68% -0.01%
==========================================
Files 104 105 +1
Lines 7584 7635 +51
Branches 677 681 +4
==========================================
+ Hits 7181 7229 +48
- Misses 300 303 +3
Partials 103 103
Impacted Files | Coverage Δ | |
---|---|---|
src/FlatSharp/OffsetModel.cs | 91.66% <91.66%> (ø) |
|
src/FlatSharp/TypeModel/TypeModelContainer.cs | 86.20% <91.66%> (+1.20%) |
:arrow_up: |
src/FlatSharp.Compiler/FlatSharpCompiler.cs | 95.20% <100.00%> (+0.40%) |
:arrow_up: |
...ompiler/TypeDefinitions/TableOrStructDefinition.cs | 94.53% <100.00%> (+0.02%) |
:arrow_up: |
src/FlatSharp/FlatBufferSerializer.cs | 91.95% <100.00%> (ø) |
|
src/FlatSharp/FlatBufferSerializerOptions.cs | 85.71% <100.00%> (+1.71%) |
:arrow_up: |
src/FlatSharp/TypeModel/EnumTypeModel.cs | 91.95% <100.00%> (ø) |
|
src/FlatSharp/TypeModel/ItemMemberModel.cs | 90.54% <100.00%> (+0.12%) |
:arrow_up: |
src/FlatSharp/TypeModel/NullableTypeModel.cs | 98.50% <100.00%> (ø) |
|
src/FlatSharp/TypeModel/RuntimeTypeModel.cs | 84.00% <100.00%> (+3.04%) |
:arrow_up: |
... and 12 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 51e0f13...5646f1a. Read the comment docs.
Hey -- Now that I have v6 done, I'm interested in taking a look at this. Thanks for the suggestions here. There is a lot to think about with this change.
There a few things that stick out to me initially:
uoffset
/ soffset
is 4 bytes, or that a vtable offset is 2 bytes will need to be tracked down. Hopefully it isn't too many, but it's something to be careful about.IList<T>
, which only has an int32
indexer defined.I've seen your BigSpan<T>
repo, and while it seems useful, it's also a lot of unsafe code that hasn't been audited. If you look at the readme for this repo, the very first goal is: "Full safety (no unsafe code or IL generation -- more on that below)." Obviously, the CLR team and .NET runtime folks use a lot of unsafe code upon which FlatSharp depends, but that's table stakes for a C# project. That's not to say I'm opposed to using BigSpan
, but it must be optional (ie, FlatSharp.Runtime
cannot depend on it). The first thing that comes to my mind is defining some IOutputBuffer
struct, but I don't think that will work since ref struct
s can't implement interfaces (to avoid boxing).
So, I'm totally game for figuring out what the bigger picture here looks like, but this will be a pretty fundamental change and I want to make sure we do due diligence up front.
And finally -- what does this look like if/when flatc
starts supporting big offsets in a year and they look different than what we prototype?
You might also be interested in this: https://github.com/dotnet/apireviews/tree/main/2016/11-04-SpanOfT#spant-and-64-bit
BigSpans is an implementation of option 3 in that API review.
It's the .NET Span runtime library code made to work with (u)longs.
There's no good way around not using it (or a reimplementation of it) if you need to address a 64-bit address range.
If flatc
implements something different at the binary formatting, whether the offsets stay 32-bit, become 64-bit, or are implemented as variable-sized ints, unless it becomes just many 32-bit chunks, the rearchitecting to support 64-bit data will need be done regardless.
making vectors support > 2^32 items
Yeah, we'll have some of that.
Should the setting be per-table or per-compilation?
Per-compilation is easier. Per-table can come later as an extension or otherwise.
All of the places I've implicitly assumed that a uoffset / soffset is 4 bytes, or that a vtable offset is 2 bytes will need to be tracked down. Hopefully it isn't too many, but it's something to be careful about.
That's what's holding us up with porting flatc
in BigBuffers but it seems to be working, we're testing output versus flatcc which somewhat already supported variable sized offsets. There's still plenty of 2 *
vtable offset size assumption in the codegen, this PR only addresses offset sizes. The tests need to all be updated to specify a range of offset sizes, as in StructVectorTests.cs
.
Thanks for the thoughts. I've been mulling this over today. Based on what I've seen, the flatc
folks don't seem terribly eager to head down this path, but it looks like the flatcc
devs are open to it.
It seems like there's an opportunity here to do more than just "64 bit flatbuffers". I think if we're going to diverge from the binary format of FlatBuffers, we shouldn't do it halfway. There's quite a bit of mindshare and expertise in this area between you, me, and the folks from flatcc
. My suggestion is to take this opportunity to actually fork the FlatBuffer format and use it as the basis for some new, non-backwards-compatible format that looks a bit like FlatBuffers. The bonus is that we're not just bolting on big offsets to the current format, but we can actually take some time and address some shortcomings of it:
varint
s? This is, obviously, a big endeavor, but much of the code from FlatSharp, flatcc, flatc will be reusable. I'm interested in collaborating on a whitepaper for some new specification, and seeing if something compelling pops out of that discussion.
Given the foundational changes we're discussing here, I think I want to do this outside of the core FlatSharp repo. FlatSharp is at this point pretty mature and stable, and I'd like it to remain FlatBuffer focused and have no dependencies outside of Roslyn / .NET standard library. I also don't know what the future looks like for FlatSharp; there aren't any immediate "things I want to add" on my list (which has happened before). That's not to say I consider it "finished" or have plans to abandon it. It's more an expression of the fact that FlatSharp is at feature parity with flatc
and I don't think there is much performance left to squeeze out of it, given that I've spent the last few years turning that crank (though I would be delighted to be proven wrong!)
Anyway, my thoughts in summary are that...
BigBuffers
should be considered a new format. We may be able to get away with using .fbs
as the IDL.BigBuffers
(can we find a better name?) as an excuse to address some of the shortcomings in FlatBuffersFlat*
and apply the changes there.This all sounds reasonable. Happy to change the name of BigBuffers to describe the project better.
You might be interested in our changelist at https://github.com/StirlingLabs/BigBuffers/blob/main/docs/source/DifferencesFromFlat.md
@TYoungSL made alignment optional without harming compatibility against FlatBuffers but it would be better to have optional alignment of tables in the spec. We actually need alignment in many of our tables for SIMD but some users wouldn't have that requirement.
Implements a new component for type models,
OffsetModel
to allow varying offset and file identifier sizes.Bumps Microsoft.CodeAnalysis.CSharp.Workspaces and Microsoft.Net.Compilers versions from 3.8.0 to 3.11.0.
VTable offset sizes are not parameterized yet.
Messages can be extremely short when 16 and 8-bit offset sizes are specified.
Messages can (obviously) contain a lot more data when 64-bit offset sizes are specified.
Implemented by compiler options,
--offset-size
,--file-identifier-size
and--strict-file-identifier-size
.This will be tested for parity against flatcc built with non-standard offset sizes. (https://github.com/dvidelabs/flatcc/pull/206)
StirlingLabs have a fork of Google's FlatBuffers called BigBuffers which also aims for parity with flatcc with 64-bit offset sizes.