google / flatbuffers

FlatBuffers: Memory Efficient Serialization Library
https://flatbuffers.dev/
Apache License 2.0
23.32k stars 3.25k forks source link

Multithreaded reading of Flatbuffer data causing data corruption [c#, flatc v1.10.0 & 1.11.0, Win10] #5745

Closed james-vanrossum-sonardyne closed 4 years ago

james-vanrossum-sonardyne commented 4 years ago

I have a C# application that is designed to read in (synchronously) log files containing various flatbuffer types, to then output this data in different formats to different output files.

As previously mentioned the read is synchronous, which subsequently populates a threadsafe primary buffer (BlockingCollection) per output file with the flatbuffer objects. Per output file, there is a (formatting) thread that formats the data from multiple properties of the flatbuffer object, and populates another threadsafe buffer with the formatted data, and a (output) thread that reads the secondary buffer and outputs it to a file.

In this specific example the formatting is taking 6 doubles from the flatbuffer and putting them in a double[].

The Flatbuffer objects in the primary buffer are held by reference. When I have two output files (and as such two "formatting threads" and two "output threads"), and these outputs are both using the same Flatbuffer objects (of the same specific type) as their source data, there are two "primary buffers" both containing references to each flatbuffer object - i.e. two references to each single flatbuffer object instance. When these "formatting threads" are run in parallel, processing the data as it arrives in the buffer, occasionally (100's of times in ~220,000 flatbuffer objects) one of the 6 fields of the flatbuffer being put into the double[] gets duplicated into one of the incorrect indexes of the target double[]. During debugging, inspecting the individual values of the flatbuffer return the correct values, however at runtime when not stepping through the code, this issue occurs.

The below image shows the issue on the left when the formatting threads are used, and the data output on the right with no erroneous data is when the synchronous reader thread is used to perform the formatting. (this is obviously far slower in processing time, and is the reason I'm trying to multithread).

image

I've seen in the docs that c++ flatbuffers are thread safe for reading here. Is this the case for C#? I believe that they are not safe from what I'm observing, and this is crippling the speed I can work on Flatbuffers if they are not threadsafe for reading. I'm afraid it is beyond my understanding as to what may be changing within the flatbuffer (tingling feeling around the bytebuffer position property), but it definitely doesn't appear thread safe.

Any help with the issue would be much appreciated. Would happily walk someone through the issue remotely.

Many thanks in advance,

James

aardappel commented 4 years ago

A FlatBuffer is in theory "thread-safe" since it is just a buffer of read only bytes. The code that accesses this buffer does not do any thread synchronization itself, so will only be thread-safe if doesn't mutate the accessors while multiple threads access it. Every Table accessor refers to a ByteBuffer, which has a Position that if it were to be modified could be thread-unsafe, but from what I can see it only ever gets set by the FlatBufferBuilder, not the reading code.

There are two ways I can see to guarantee this is thread safe: 1) Audit Table and related accessors to ensure they are not mutating.. I don't think they do, but we could be missing something. 2) Don't share accessors between threads. Make each thread refer to its own ByteBuffer and accessors, and only share the underlying byte data.

james-vanrossum-sonardyne commented 4 years ago

Sorry for the delay in reply. I have tried your second option, by sharing the underlying bytes from the ByteBuffer, and then creating a ByteBuffer and it's accessors from that for each accessing thread. This does solve the issue of corrupted data, however the overhead required in recreating the ByteBuffers is too expensive and is slower now than a singlethreaded implementation. I'm not sure how to perform the auditing you mention above in realtime (as the issue doesn't present itself when stepping through/rewinding and stepping through an observed occurrance of the data corruption). Any suggestions would be appreciated (Can also LiveShare if feasible?)

Edit: In additional testing, I found that simply locking the Flatbuffer struct that each thread is accessing during it's read resolves the data corruption issue. I'm therefore almost 100% certain these aren't thread safe for reading. Is there anything we can do to address this?

vglavnyy commented 4 years ago

Can you show your schema or generated code and reading code? Also values of definitions UNSAFE_BYTEBUFFER, BYTEBUFFER_NO_BOUNDS_CHECK, ENABLE_SPAN_T.

The ByteBuffer has four preallocated arrays: https://github.com/google/flatbuffers/blob/6d44cede700eb04727572bd0b63aeca58c815a0a/net/FlatBuffers/ByteBuffer.cs#L291-L297

The doublehelper and ulonghelper are used by GetDouble(): https://github.com/google/flatbuffers/blob/6d44cede700eb04727572bd0b63aeca58c815a0a/net/FlatBuffers/ByteBuffer.cs#L791-L798

plop44 commented 4 years ago

Hi,

I have been facing a bug that really looks like the one @pejaii described. FlatBuffer has been compiled with no compilation options.

My schema:

table MyFlatBufferObject 
{
                Value1:double;
                Value2:double;
}

root_type MyFlatBufferObject;

My code to reproduce:

private int _success;
private int _comparisons;

[Test]
public void WhenWeTestFlatBufferThreadSafety()
{
                MyFlatBufferObject myFlatBufferObject = GetMyFlatBufferObject();

                // reliable values (as read in one thread) expected in code below
                var value1 = myFlatBufferObject.Value1;
                var value2 = myFlatBufferObject.Value2;

                bool run = true;
                void KeepComparing()
                {
                                while (run)
                                {
                                                Interlocked.Add(ref _comparisons, 1);
                                                var c1 = myFlatBufferObject.Value1 == value1;
                                                var c2 = myFlatBufferObject.Value2 == value2;

                                                if (c1 && c2)
                                                                Interlocked.Add(ref _success, 1);
                                }
                }

                for (int i = 0; i < 10; i++)
                {
                                Task.Run(KeepComparing);
                }

                Thread.Sleep(TimeSpan.FromSeconds(10));
                run = false;
                // My apologise for the sleep synchronisation
                Thread.Sleep(100);

                Console.WriteLine($"{_comparisons:N0} comparisons");
                Console.WriteLine($"{_success:N0} successes");
                Console.WriteLine($"{_comparisons-_success:N0} failures");
                Console.WriteLine($"{(_comparisons-_success)*1.0/_comparisons:P} failures");
}

private MyFlatBufferObject GetMyFlatBufferObject()
{
    var flatBufferBuilder = new FlatBufferBuilder(1);

                MyFlatBufferObject.StartMyFlatBufferObject(flatBufferBuilder);
                MyFlatBufferObject.AddValue1(flatBufferBuilder, 123456.789);
                MyFlatBufferObject.AddValue2(flatBufferBuilder, 987654.321);

                var theEnd = MyFlatBufferObject.EndMyFlatBufferObject(flatBufferBuilder);
                MyFlatBufferObject.FinishMyFlatBufferObjectBuffer(flatBufferBuilder, theEnd);

                return MyFlatBufferObject.GetRootAsMyFlatBufferObject(flatBufferBuilder.DataBuffer);
}

My output:

99,107,133 comparisons 65,063,049 successes 34,044,084 failures 34.35% failures

An interesting point is when I compile code using option UNSAFE_BYTEBUFFER all the comparisons succeed. If I keep only one property (Value1 for example) and no compilation options, it works fine as well. I suspect ByteBuffer.doublehelper and co as mentioned by @vglavnyy

In my mind that is a bug and thread safety should be guaranteed on multiple read operations.
I hope that will help to solve it. This issue seems to be addressing the problem, but it has never been through.

Thanks

james-vanrossum-sonardyne commented 4 years ago

@plop44 If it helps, I've gotten around this by a use of simple locks around each object when reading them. Obviously this does slow down asynchronous operations, but only where these "errors" are occurring, so not all the time (~35% by your records, mine were much less at ~0.1%).

My team are not yet ready to move up to version 1.12 flatc compiler, however the object api support for C# within this may also help. Though for my scenario, the increased memory usage that undoubtedly comes with this is unappealing, given the millions of FB objects that would be created.

vglavnyy commented 4 years ago

It is obvious that GetDouble has data-race conditions. I don't know why it was implemented in this way. As for me, the condition #if !UNSAFE_BYTEBUFFER looks like #if !SAFE_BYTEBUFFER.

//      UNSAFE_BYTEBUFFER 
//          This will use unsafe code to manipulate the underlying byte array. This
//          can yield a reasonable performance increase.

The condition #if !UNSAFE_BYTEBUFFER was added by #4445 (08cf50c54a704bd76a30602a5efc65ccb7985695)

@robert-schmidtke or @aardappel may know more