mono / SkiaSharp

SkiaSharp is a cross-platform 2D graphics API for .NET platforms based on Google's Skia Graphics Library. It provides a comprehensive 2D API that can be used across mobile, server and desktop models to render images.
MIT License
4.56k stars 543 forks source link

[BUG] Random unit test crashes in dev/update-skia? #1227

Open ziriax opened 4 years ago

ziriax commented 4 years ago

I am trying to run the units tests in the dev/update-skia branch, so I can continue working on PRs, but my dotnet test run always fails randomly, with native exceptions (access violations). Sometimes a dozens of tests succeed before crashing, sometimes upto abour 100, so this smells like a multi threading issues. I have 32 hyperthreads (AMD 3950x) all running at the same time, so this might reveal race conditions.

Is this a known issue?

I have build both the native and managed code successfully, on Windows 10, using the latest version of VS2019.

ziriax commented 4 years ago

If I re-run only the failed tests, they eventually all pass.

So clearly some shared state is getting corrupted here.

mattleibow commented 4 years ago

Yeah, I notice that as well. I have been working in master to fix tests so they all run locally. I will merge into update soon.

But that won't fix them all. In some cases, the rules have changed. For example, in one case the base types have changed in the native code (SkColorSpace went from virtual to non-virtual). In another case, the method return values change from a copy to a reference and vice versa. Only the native comments show this.

So I just have to run various combos until I find a repro crash and then try and figure it out.

What was crashing? What was the native pinvoke?

Another thing I have noticed that the this object sometimes gets collected while a method is running. The one I saw last night was:

class SKTextBlobBuilder {
    public SKTextBlob Build () =>
        GetObject<SKTextBlob> (SkiaApi.sk_textblob_builder_make (Handle));
}

SkiaApi.sk_textblob_builder_make crashes because the object is collected and disposed while the native member is running.

There are 2 fixes for this type:

ziriax commented 4 years ago

Just to be sure, the dev/update-skia branch is the current beta that will become the next release right?

I can reproduce one problem at least by putting a for loop around part of the test.

For example:

        [SkippableFact]
        public void StreamLosesOwnershipToCodecButIsNotForgotten()
        {
            var bytes = File.ReadAllBytes(Path.Combine(PathToImages, "color-wheel.png"));
            var stream = new SKMemoryStream(bytes);
            var handle = stream.Handle;

            Assert.True(stream.OwnsHandle);
            Assert.True(SKObject.GetInstance<SKMemoryStream>(handle, out _));

            var codec = SKCodec.Create(stream);
            Assert.False(stream.OwnsHandle);

            stream.Dispose();
            Assert.True(SKObject.GetInstance<SKMemoryStream>(handle, out _));

            for (int i = 0; i < 1000; ++i)
            {
                Assert.Equal(SKCodecResult.Success, codec.GetPixels(out var pixels));
                Assert.NotEmpty(pixels);
            }
        }

If I run this test with

dotnet test -p:ParallelizeTestCollections=false --logger "console;verbosity=detailed" --filter DisplayName~StreamLosesOwnershipToCodecButIsNotForgotten

it eventually fails with

[xUnit.net 00:00:00.50]       System.InvalidOperationException : About to unreference an object that has no references. H: 1bf6db1a700 Type: SkiaSharp.SKColorSpace+SKColorSpaceStatic
[xUnit.net 00:00:00.50]       Stack Trace:
[xUnit.net 00:00:00.51]         C:\dev\SkiaSharp\binding\Binding\HandleDictionary.cs(66,0): at SkiaSharp.HandleDictionary.GetObject[TSkiaObject,TSkiaImplementation](IntPtr handle, Boolean owns, Boolean unrefExisting, Boolean refNew)
[xUnit.net 00:00:00.51]         C:\dev\SkiaSharp\binding\Binding\SKObject.cs(90,0): at SkiaSharp.SKObject.GetObject[TSkiaObject](IntPtr handle, Boolean owns, Boolean unrefExisting, Boolean refNew)
[xUnit.net 00:00:00.51]         C:\dev\SkiaSharp\binding\Binding\SKImageInfo.cs(26,0): at SkiaSharp.SKImageInfoNative.ToManaged(SKImageInfoNative& native)
[xUnit.net 00:00:00.51]         C:\dev\SkiaSharp\binding\Binding\SKCodec.cs(31,0): at SkiaSharp.SKCodec.get_Info()
[xUnit.net 00:00:00.51]         C:\dev\SkiaSharp\binding\Binding\SKCodec.cs(99,0): at SkiaSharp.SKCodec.GetPixels(Byte[]& pixels)
[xUnit.net 00:00:00.51]         c:\dev\SkiaSharp\tests\Tests\SKCodecTest.cs(61,0): at SkiaSharp.Tests.SKCodecTest.StreamLosesOwnershipToCodecButIsNotForgotten()
[xUnit.net 00:00:00.51]   Finished:    SkiaSharp.Tests
  V SkiaSharp.Tests.SKManagedStreamTest.StreamLosesOwnershipToCodecButIsNotForgotten [19ms]
  X SkiaSharp.Tests.SKCodecTest.StreamLosesOwnershipToCodecButIsNotForgotten [20ms]
  Error Message:
   System.InvalidOperationException : About to unreference an object that has no references. H: 1bf6db1a700 Type: SkiaSharp.SKColorSpace+SKColorSpaceStatic
  Stack Trace:
     at SkiaSharp.HandleDictionary.GetObject[TSkiaObject,TSkiaImplementation](IntPtr handle, Boolean owns, Boolean unrefExisting, Boolean refNew) in C:\dev\SkiaSharp\binding\Binding\HandleDictionary.cs:line 66
   at SkiaSharp.SKObject.GetObject[TSkiaObject](IntPtr handle, Boolean owns, Boolean unrefExisting, Boolean refNew) in C:\dev\SkiaSharp\binding\Binding\SKObject.cs:line 90
   at SkiaSharp.SKImageInfoNative.ToManaged(SKImageInfoNative& native) in C:\dev\SkiaSharp\binding\Binding\SKImageInfo.cs:line 26
   at SkiaSharp.SKCodec.get_Info() in C:\dev\SkiaSharp\binding\Binding\SKCodec.cs:line 31
   at SkiaSharp.SKCodec.GetPixels(Byte[]& pixels) in C:\dev\SkiaSharp\binding\Binding\SKCodec.cs:line 99
   at SkiaSharp.Tests.SKCodecTest.StreamLosesOwnershipToCodecButIsNotForgotten() in c:\dev\SkiaSharp\tests\Tests\SKCodecTest.cs:line 61

Many tests fail for the same reason I guess

ziriax commented 4 years ago

This one seems related to SKColorSpaceStatic loosing reference counts while it is a global static object. Digging deeper, good way for me to learn the internals.

ziriax commented 4 years ago

Okay, that one was easily fixed by not changing the reference count in the SKColorSpaceStatic class.

After this fix I don't get crashes anymore, although now the tests hang (and some are failing, but consistent)

ziriax commented 4 years ago

Visual Studio is now able to run all tests, although some are failing, but not crashing at first sight. It seems to depend from run to run, so surely not complete fixed, but an improvement nevertheless.

My patch was really simple, I'll make a PR so we can discuss code.

diff --git a/binding/Binding/HandleDictionary.cs b/binding/Binding/HandleDictionary.cs
index 274e4100ad74..ec66a649d795 100644
--- a/binding/Binding/HandleDictionary.cs
+++ b/binding/Binding/HandleDictionary.cs
@@ -58,7 +58,8 @@ namespace SkiaSharp
                    // but managed code just has the same reference
                    if (unrefExisting && instance is ISKReferenceCounted refcnt) {
 #if THROW_OBJECT_EXCEPTIONS
-                       if (refcnt.GetReferenceCount () == 1)
+                       var count = refcnt.GetReferenceCount ();
+                       if (count <= 1)
                            throw new InvalidOperationException (
                                $"About to unreference an object that has no references. " +
                                $"H: {handle.ToString ("x")} Type: {instance.GetType ()}");
diff --git a/binding/Binding/SKColorSpace.cs b/binding/Binding/SKColorSpace.cs
index 7aff28283e2b..08c1ba562b31 100644
--- a/binding/Binding/SKColorSpace.cs
+++ b/binding/Binding/SKColorSpace.cs
@@ -26,11 +26,11 @@ namespace SkiaSharp
        {
        }

-       void ISKNonVirtualReferenceCounted.ReferenceNative () =>
-           SkiaApi.sk_colorspace_ref (Handle);
+       void ISKNonVirtualReferenceCounted.ReferenceNative () => OnReferenceNative();
+       void ISKNonVirtualReferenceCounted.UnreferenceNative () => OnUnreferencenative ();

-       void ISKNonVirtualReferenceCounted.UnreferenceNative () =>
-           SkiaApi.sk_colorspace_unref (Handle);
+       protected virtual void OnReferenceNative () => SkiaApi.sk_colorspace_ref (Handle);
+       protected virtual void OnUnreferencenative() => SkiaApi.sk_colorspace_unref (Handle);

        protected override void Dispose (bool disposing) =>
            base.Dispose (disposing);
@@ -267,6 +267,16 @@ namespace SkiaSharp
                IgnorePublicDispose = true;
            }

+           protected override void OnReferenceNative ()
+           {
+               // do not increment global reference
+           }
+
+           protected override void OnUnreferencenative ()
+           {
+               // do not release global reference
+           }
+
            protected override void Dispose (bool disposing)
            {
                // do not dispose
ziriax commented 4 years ago

Although my above patch seems to work, I have some questions about the following signature in HandleDictionary:

internal static TSkiaObject GetObject<TSkiaObject, TSkiaImplementation> (IntPtr handle, bool owns = true, bool unrefExisting = true, bool refNew = false)

Shouldn't unrefExisting be ignored if the existing instance didn't own its handle?

And shouldn't refNew be ignored when the new instance being created isn't the owner of the handle?

That would also fix the problem, since SKColorSpaceStatic doesn't own its handle.

ziriax commented 4 years ago

I've traced the random deadlock to a call to the WglContext constructor.

It appears that when multiple threads are calling wglChoosePixelFormatARB at the same time, my PC deadlocks. A bit like this very old issue

This might be a driver bug, but putting a global lock around this call fixes my deadlocks

Gillibald commented 4 years ago

Another thing I have noticed that the this object sometimes gets collected while a method is running. The one I saw last night was:

How is it possible that it crashes when the builder is collected? When builder->make returns the ownership is transferred to the caller so there should be no relationship between builder and blob.

Not holding a reference to an owning object is just an error on the developer's side. SkiaSharp keeps adding bookkeeping mechanisms that degrade performance.

If SKSurface produces a canvas it should be clear to me that I have to keep a reference to the surface until I am finished drawing stuff. Do we have any numbers on reusing existing instances?

SKFontManger, for example, could hold all references to created instances of SKTypeface until it gets collected. There is no need to cache objects globally. If you consume Skia you are always responsible to keep track of object ownership.

In general, most objects will have a refcount == 1 and they are most likely never requested again from the global instance's dictionary. So instead of just creating a new instance of an object that is only used once we have that extra overhead of looking up the dictionary for an existing instance. This might involve locking. Destruction of objects has the same overhead.

So I suggest to not cache objects globally by default and only cache them when it is explicitly requested. SKFontManager will need this cache. SKString doesn't etc.

mattleibow commented 4 years ago

@Ziriax (https://github.com/mono/SkiaSharp/pull/1237#pullrequestreview-395763222)

The code && instance.OwnsHandle specifically was needed to avoid the singleton objects like SKColorSpaceStatic from being deleted, but I noticed that you changed the constructor of these in 2191818177af4bfb4461859feda2ae6542767d54, so that these singletons now do own their native handle?

If that fixes the issue, great, but maybe you could add a Debug.Assert so that unrefExisting is not called on instances for which OwnsHandle is false? Maybe I misinterpreted the meaning of OwnsHandle.

mattleibow commented 4 years ago

@mattleibow

I am looking into the cause of the crash, I think they changed something between this and the next version. Your test case crashes every time on the m80, but not at all on the m68. Having a look at the native API now, but I do see a difference:

m68 (https://github.com/google/skia/blob/chrome/m68/include/codec/SkCodec.h#L174):

const SkImageInfo& getInfo() const;

m80 (https://github.com/google/skia/blob/chrome/m80/include/codec/SkCodec.h#L194):

SkImageInfo getInfo() const;

And, under the hood, there is a copy I think.

Owns handle is just whether the object is responsible for deconstructing the reference/object. In the case of the statics, the initial reference is incremented, therefore we have to also decrement when it is collected.

But, you were almost on the exact right track. The object is reference counted, but in the case of the particular call of getInfo, I think it is not incremented. In that case, we need to let the managed side know that is should not decrement (using the unrefExisting param):

https://github.com/mono/SkiaSharp/blob/dev/update-skia/binding/Binding/SKImageInfo.cs#L27

But, I am just going to do some tests quick and confirm.

mattleibow commented 4 years ago

@Ziriax

Owns handle is just whether the object is responsible for deconstructing the reference/object. In the case of the statics, the initial reference is incremented, therefore we have to also decrement when it is collected.

But, you were almost on the exact right track. The object is reference counted, but in the case of the particular call of getInfo, I think it is not incremented. In that case, we need to let the managed side know that is should not decrement (using the unrefExisting param):

https://github.com/mono/SkiaSharp/blob/dev/update-skia/binding/Binding/SKImageInfo.cs#L27

But, I am just going to do some tests quick and confirm.

Oh nice catch. So this is way more complex than I thought.

A look at the implementation shows that a new copy is always returned by SkCodec::getInfo():

  */
    SkImageInfo makeImageInfo() const {
        auto ct =  kGray_Color == fColor ? kGray_8_SkColorType   :
                 kXAlpha_Color == fColor ? kAlpha_8_SkColorType  :
                    k565_Color == fColor ? kRGB_565_SkColorType  :
                                           kN32_SkColorType      ;
        auto alpha = kOpaque_Alpha == fAlpha ? kOpaque_SkAlphaType
                                             : kUnpremul_SkAlphaType;
        sk_sp<SkColorSpace> cs = fProfile ? SkColorSpace::Make(*fProfile->profile())
                                          : nullptr;
        if (!cs) {
            cs = SkColorSpace::MakeSRGB();
        }
        return SkImageInfo::Make(fWidth, fHeight, ct, alpha, std::move(cs));
    }
mattleibow commented 4 years ago

Looking at the SKCodec, there is an issue somewhere there. The actual call to getInfo increments, and then we decrement. ~But, somehow, after the codec is collected, the colorspace is again decremented...~ Looking into that.

The GetPixels call seems to decrement it... Investigating...

ziriax commented 4 years ago

SkCodec::getInfo() returns a struct, and also in C# this is a struct, so that can't cause any refcount problems IMO

You say GetPixels is misbehaving, on what class?

mattleibow commented 4 years ago

I think I found it...

m68: (https://github.com/mono/skia/blob/xamarin-mobile-bindings/src/c/sk_types_priv.h#L193) ref on the way in

sk_ref_sp(AsColorSpace(info->colorspace)));

m80: (https://github.com/mono/skia/blob/dev/update-m80/src/c/sk_types_priv.h#L210) NO REF on the way in

sk_sp<SkColorSpace>(AsColorSpace(info->colorspace))); 

Let me fix that. What is happening is that when I get the struct from native code, the cs is incremented. When it reaches managed code, it is decremented. So no problem. However, when it goes back into the native code, it is not incremented, so when the native code cleans up, it gets decremented.

mattleibow commented 4 years ago

Fixed in 1590d7e1e81581e1c4ab1252b7d3dcb563d1a8e1 and https://github.com/mono/skia/commit/47de7965be076311e824589b0464a4e1e08d54eb