migueldeicaza / TensorFlowSharp

TensorFlow API for .NET languages
MIT License
3.14k stars 578 forks source link

AccessViolationException during TFTensor.GetValue() with Optimize Code build #419

Open alexwpitt opened 5 years ago

alexwpitt commented 5 years ago

Describe the bug The current implementation of the Dipose/Finalization pattern for TFDisposable based classes can lead to memory access violations when compiling with 'Optimize Code'. The garbage collector in C# can be quite aggressive and can dispose of an object even before finishing a call to a member function of that object (it becomes eligible for collection immediately after the access of member data).

Specific example: https://github.com/migueldeicaza/TensorFlowSharp/blob/master/TensorFlowSharp/Tensor.cs#L1391 At the point that FetchMultiDimensionalArray is called, there are no further references to TFTensor members. If the external caller of GetValue doesn't ensure the lifetime of the TFTensor extends beyond the call, then the TFTensor can be cleaned up (leading to disposal and invalidation of the the native Tensor handle) while FetchMultiDimensionalArray is still running.

To Reproduce In any example code with fairly a large output tensor (to increase likelihood of garbage collection kicking in during memory copy from native to managed code), use this code to run the graph and access the output tensor data and ensure there are no further references to the returned TFTensor: `

        var runner = someSession.GetRunner();

        runner.AddInput(someInput, someInputValue);

        var outputTensor = runner.Run(someOp);

        var outputValue = (float[,,])outputTensor.GetValue();

        // workaround: GC.KeepAlive(outputTensor)

` Then compile with 'Optimize Code' enabled.

Expected behavior This should work fine. Instead, I reliably observe AccessViolationException occurring at random points during the call to GetValue (as the collection kicks in during Copy calls within FetchMultiDimensionalArray).

Other context Easy fix would be adding GC.KeepAlive(this) to the end of GetValue (and any methods that use native resources the same way) as per: https://docs.microsoft.com/en-us/visualstudio/code-quality/ca2115-call-gc-keepalive-when-using-native-resources

alexwpitt commented 5 years ago

I suspect the same bug is present in ML.Net which interfaces with Tensorflow using code based on this library.