Closed jetersen closed 1 year ago
You're correct that FlatSharp doesn't have a streaming API. The official FlatBuffer library doesn't have one either. Streaming is difficult with FlatBuffers for a few reasons:
FileStream
, MemoryStream
), compression streams do not last I checked. In that vein, writing FlatBuffers also requires jumping around in the buffer.Span<byte>
from the input source (array/memory/etc).You maybe could implement IInputBuffer
yourself using a stream of choice (assuming it supports seeking), but you would end up doing lots of seeking and would need to worry about concurrency, since multiple threads could be fighting over the Position
property. However, if streaming is a dealbreaker, my best advice to you is to use a serialization format that always goes left-to-right. I believe that Protobuf, MsgPack, and likely lots of others will fill this requirement for you.
The doesn't answer your compression question, but FlatBuffers lends itself very well to Memory Mapped files. I know this isn't streaming, but if all you need is File I/O, memory mapped files do offer many of the advantages of streaming since the OS manages how much is actually kept in memory at a time
I'm not sure if you have duplicate strings in your data or not, but FlatSharp does support string deduplication (see the shared strings sample). If you are encoding the same string multiple times, FlatSharp can deduplicate those in the output for you. The way this works is that strings are referenced by pointer (one of those random access cases), so for shared strings, FlatSharp will track all the places that need to point to a given string and write all those pointers at once.
You won't get close to the results that you might with gzip, but for cases where there are repeated strings, it can make a big difference in the output size.
@jamescourtney thanks for the detailed explanation. That helped me understand the use case for FlatBuffers better so much appreciated! Hopefully others looking for streaming API can use this as a reference. It also helped me understand the different deserialization options. Your right we are trying to compare various approaches to deserialization and comparing the benefits. In our benchmarks I was using progressive which proved to be an unfair in the use case of comparison but for the application it might prove to be a good use case. Greedy deserialization was right choice for comparison in benchmarks.
The input does not contain duplicates so deduplication would not help.
I'll close the issue. Thanks for answering my question and providing excellent answers.
Maybe I am using it wrong, but it feels like FlatSharp is missing a streaming API for parse.
I have some data that is highly compressible due to UTF-8 strings would like to gzip it.