Open ziriax opened 4 years ago
If I re-run only the failed tests, they eventually all pass.
So clearly some shared state is getting corrupted here.
Yeah, I notice that as well. I have been working in master to fix tests so they all run locally. I will merge into update soon.
But that won't fix them all. In some cases, the rules have changed. For example, in one case the base types have changed in the native code (SkColorSpace
went from virtual to non-virtual). In another case, the method return values change from a copy to a reference and vice versa. Only the native comments show this.
So I just have to run various combos until I find a repro crash and then try and figure it out.
What was crashing? What was the native pinvoke?
Another thing I have noticed that the this
object sometimes gets collected while a method is running. The one I saw last night was:
class SKTextBlobBuilder {
public SKTextBlob Build () =>
GetObject<SKTextBlob> (SkiaApi.sk_textblob_builder_make (Handle));
}
SkiaApi.sk_textblob_builder_make
crashes because the object is collected and disposed while the native member is running.
There are 2 fixes for this type:
GC.KeepAlive(this)
at the end of the method
SafeHandle
Just to be sure, the dev/update-skia
branch is the current beta that will become the next release right?
I can reproduce one problem at least by putting a for loop around part of the test.
For example:
[SkippableFact]
public void StreamLosesOwnershipToCodecButIsNotForgotten()
{
var bytes = File.ReadAllBytes(Path.Combine(PathToImages, "color-wheel.png"));
var stream = new SKMemoryStream(bytes);
var handle = stream.Handle;
Assert.True(stream.OwnsHandle);
Assert.True(SKObject.GetInstance<SKMemoryStream>(handle, out _));
var codec = SKCodec.Create(stream);
Assert.False(stream.OwnsHandle);
stream.Dispose();
Assert.True(SKObject.GetInstance<SKMemoryStream>(handle, out _));
for (int i = 0; i < 1000; ++i)
{
Assert.Equal(SKCodecResult.Success, codec.GetPixels(out var pixels));
Assert.NotEmpty(pixels);
}
}
If I run this test with
dotnet test -p:ParallelizeTestCollections=false --logger "console;verbosity=detailed" --filter DisplayName~StreamLosesOwnershipToCodecButIsNotForgotten
it eventually fails with
[xUnit.net 00:00:00.50] System.InvalidOperationException : About to unreference an object that has no references. H: 1bf6db1a700 Type: SkiaSharp.SKColorSpace+SKColorSpaceStatic
[xUnit.net 00:00:00.50] Stack Trace:
[xUnit.net 00:00:00.51] C:\dev\SkiaSharp\binding\Binding\HandleDictionary.cs(66,0): at SkiaSharp.HandleDictionary.GetObject[TSkiaObject,TSkiaImplementation](IntPtr handle, Boolean owns, Boolean unrefExisting, Boolean refNew)
[xUnit.net 00:00:00.51] C:\dev\SkiaSharp\binding\Binding\SKObject.cs(90,0): at SkiaSharp.SKObject.GetObject[TSkiaObject](IntPtr handle, Boolean owns, Boolean unrefExisting, Boolean refNew)
[xUnit.net 00:00:00.51] C:\dev\SkiaSharp\binding\Binding\SKImageInfo.cs(26,0): at SkiaSharp.SKImageInfoNative.ToManaged(SKImageInfoNative& native)
[xUnit.net 00:00:00.51] C:\dev\SkiaSharp\binding\Binding\SKCodec.cs(31,0): at SkiaSharp.SKCodec.get_Info()
[xUnit.net 00:00:00.51] C:\dev\SkiaSharp\binding\Binding\SKCodec.cs(99,0): at SkiaSharp.SKCodec.GetPixels(Byte[]& pixels)
[xUnit.net 00:00:00.51] c:\dev\SkiaSharp\tests\Tests\SKCodecTest.cs(61,0): at SkiaSharp.Tests.SKCodecTest.StreamLosesOwnershipToCodecButIsNotForgotten()
[xUnit.net 00:00:00.51] Finished: SkiaSharp.Tests
V SkiaSharp.Tests.SKManagedStreamTest.StreamLosesOwnershipToCodecButIsNotForgotten [19ms]
X SkiaSharp.Tests.SKCodecTest.StreamLosesOwnershipToCodecButIsNotForgotten [20ms]
Error Message:
System.InvalidOperationException : About to unreference an object that has no references. H: 1bf6db1a700 Type: SkiaSharp.SKColorSpace+SKColorSpaceStatic
Stack Trace:
at SkiaSharp.HandleDictionary.GetObject[TSkiaObject,TSkiaImplementation](IntPtr handle, Boolean owns, Boolean unrefExisting, Boolean refNew) in C:\dev\SkiaSharp\binding\Binding\HandleDictionary.cs:line 66
at SkiaSharp.SKObject.GetObject[TSkiaObject](IntPtr handle, Boolean owns, Boolean unrefExisting, Boolean refNew) in C:\dev\SkiaSharp\binding\Binding\SKObject.cs:line 90
at SkiaSharp.SKImageInfoNative.ToManaged(SKImageInfoNative& native) in C:\dev\SkiaSharp\binding\Binding\SKImageInfo.cs:line 26
at SkiaSharp.SKCodec.get_Info() in C:\dev\SkiaSharp\binding\Binding\SKCodec.cs:line 31
at SkiaSharp.SKCodec.GetPixels(Byte[]& pixels) in C:\dev\SkiaSharp\binding\Binding\SKCodec.cs:line 99
at SkiaSharp.Tests.SKCodecTest.StreamLosesOwnershipToCodecButIsNotForgotten() in c:\dev\SkiaSharp\tests\Tests\SKCodecTest.cs:line 61
Many tests fail for the same reason I guess
This one seems related to SKColorSpaceStatic
loosing reference counts while it is a global static object. Digging deeper, good way for me to learn the internals.
Okay, that one was easily fixed by not changing the reference count in the SKColorSpaceStatic
class.
After this fix I don't get crashes anymore, although now the tests hang (and some are failing, but consistent)
Visual Studio is now able to run all tests, although some are failing, but not crashing at first sight. It seems to depend from run to run, so surely not complete fixed, but an improvement nevertheless.
My patch was really simple, I'll make a PR so we can discuss code.
diff --git a/binding/Binding/HandleDictionary.cs b/binding/Binding/HandleDictionary.cs
index 274e4100ad74..ec66a649d795 100644
--- a/binding/Binding/HandleDictionary.cs
+++ b/binding/Binding/HandleDictionary.cs
@@ -58,7 +58,8 @@ namespace SkiaSharp
// but managed code just has the same reference
if (unrefExisting && instance is ISKReferenceCounted refcnt) {
#if THROW_OBJECT_EXCEPTIONS
- if (refcnt.GetReferenceCount () == 1)
+ var count = refcnt.GetReferenceCount ();
+ if (count <= 1)
throw new InvalidOperationException (
$"About to unreference an object that has no references. " +
$"H: {handle.ToString ("x")} Type: {instance.GetType ()}");
diff --git a/binding/Binding/SKColorSpace.cs b/binding/Binding/SKColorSpace.cs
index 7aff28283e2b..08c1ba562b31 100644
--- a/binding/Binding/SKColorSpace.cs
+++ b/binding/Binding/SKColorSpace.cs
@@ -26,11 +26,11 @@ namespace SkiaSharp
{
}
- void ISKNonVirtualReferenceCounted.ReferenceNative () =>
- SkiaApi.sk_colorspace_ref (Handle);
+ void ISKNonVirtualReferenceCounted.ReferenceNative () => OnReferenceNative();
+ void ISKNonVirtualReferenceCounted.UnreferenceNative () => OnUnreferencenative ();
- void ISKNonVirtualReferenceCounted.UnreferenceNative () =>
- SkiaApi.sk_colorspace_unref (Handle);
+ protected virtual void OnReferenceNative () => SkiaApi.sk_colorspace_ref (Handle);
+ protected virtual void OnUnreferencenative() => SkiaApi.sk_colorspace_unref (Handle);
protected override void Dispose (bool disposing) =>
base.Dispose (disposing);
@@ -267,6 +267,16 @@ namespace SkiaSharp
IgnorePublicDispose = true;
}
+ protected override void OnReferenceNative ()
+ {
+ // do not increment global reference
+ }
+
+ protected override void OnUnreferencenative ()
+ {
+ // do not release global reference
+ }
+
protected override void Dispose (bool disposing)
{
// do not dispose
Although my above patch seems to work, I have some questions about the following signature in HandleDictionary
:
internal static TSkiaObject GetObject<TSkiaObject, TSkiaImplementation> (IntPtr handle, bool owns = true, bool unrefExisting = true, bool refNew = false)
Shouldn't unrefExisting
be ignored if the existing instance didn't own its handle?
And shouldn't refNew
be ignored when the new instance being created isn't the owner of the handle?
That would also fix the problem, since SKColorSpaceStatic
doesn't own its handle.
I've traced the random deadlock to a call to the WglContext
constructor.
It appears that when multiple threads are calling wglChoosePixelFormatARB
at the same time, my PC deadlocks. A bit like this very old issue
This might be a driver bug, but putting a global lock around this call fixes my deadlocks
Another thing I have noticed that the this object sometimes gets collected while a method is running. The one I saw last night was:
How is it possible that it crashes when the builder is collected? When builder->make returns the ownership is transferred to the caller so there should be no relationship between builder and blob.
Not holding a reference to an owning object is just an error on the developer's side. SkiaSharp keeps adding bookkeeping mechanisms that degrade performance.
If SKSurface produces a canvas it should be clear to me that I have to keep a reference to the surface until I am finished drawing stuff. Do we have any numbers on reusing existing instances?
SKFontManger, for example, could hold all references to created instances of SKTypeface until it gets collected. There is no need to cache objects globally. If you consume Skia you are always responsible to keep track of object ownership.
In general, most objects will have a refcount == 1 and they are most likely never requested again from the global instance's dictionary. So instead of just creating a new instance of an object that is only used once we have that extra overhead of looking up the dictionary for an existing instance. This might involve locking. Destruction of objects has the same overhead.
So I suggest to not cache objects globally by default and only cache them when it is explicitly requested. SKFontManager will need this cache. SKString doesn't etc.
@Ziriax (https://github.com/mono/SkiaSharp/pull/1237#pullrequestreview-395763222)
The code
&& instance.OwnsHandle
specifically was needed to avoid the singleton objects likeSKColorSpaceStatic
from being deleted, but I noticed that you changed the constructor of these in 2191818177af4bfb4461859feda2ae6542767d54, so that these singletons now do own their native handle?If that fixes the issue, great, but maybe you could add a
Debug.Assert
so thatunrefExisting
is not called on instances for whichOwnsHandle
is false? Maybe I misinterpreted the meaning ofOwnsHandle
.
@mattleibow
I am looking into the cause of the crash, I think they changed something between this and the next version. Your test case crashes every time on the m80, but not at all on the m68. Having a look at the native API now, but I do see a difference:
m68 (https://github.com/google/skia/blob/chrome/m68/include/codec/SkCodec.h#L174):
const SkImageInfo& getInfo() const;
m80 (https://github.com/google/skia/blob/chrome/m80/include/codec/SkCodec.h#L194):
SkImageInfo getInfo() const;
And, under the hood, there is a copy I think.
Owns handle is just whether the object is responsible for deconstructing the reference/object. In the case of the statics, the initial reference is incremented, therefore we have to also decrement when it is collected.
But, you were almost on the exact right track. The object is reference counted, but in the case of the particular call of
getInfo
, I think it is not incremented. In that case, we need to let the managed side know that is should not decrement (using theunrefExisting
param):https://github.com/mono/SkiaSharp/blob/dev/update-skia/binding/Binding/SKImageInfo.cs#L27
But, I am just going to do some tests quick and confirm.
@Ziriax
Owns handle is just whether the object is responsible for deconstructing the reference/object. In the case of the statics, the initial reference is incremented, therefore we have to also decrement when it is collected.
But, you were almost on the exact right track. The object is reference counted, but in the case of the particular call of
getInfo
, I think it is not incremented. In that case, we need to let the managed side know that is should not decrement (using theunrefExisting
param):https://github.com/mono/SkiaSharp/blob/dev/update-skia/binding/Binding/SKImageInfo.cs#L27
But, I am just going to do some tests quick and confirm.
Oh nice catch. So this is way more complex than I thought.
A look at the implementation shows that a new copy is always returned by
SkCodec::getInfo()
:*/ SkImageInfo makeImageInfo() const { auto ct = kGray_Color == fColor ? kGray_8_SkColorType : kXAlpha_Color == fColor ? kAlpha_8_SkColorType : k565_Color == fColor ? kRGB_565_SkColorType : kN32_SkColorType ; auto alpha = kOpaque_Alpha == fAlpha ? kOpaque_SkAlphaType : kUnpremul_SkAlphaType; sk_sp<SkColorSpace> cs = fProfile ? SkColorSpace::Make(*fProfile->profile()) : nullptr; if (!cs) { cs = SkColorSpace::MakeSRGB(); } return SkImageInfo::Make(fWidth, fHeight, ct, alpha, std::move(cs)); }
Looking at the SKCodec
, there is an issue somewhere there. The actual call to getInfo
increments, and then we decrement. ~But, somehow, after the codec is collected, the colorspace is again decremented...~ Looking into that.
The GetPixels
call seems to decrement it... Investigating...
SkCodec::getInfo()
returns a struct, and also in C# this is a struct, so that can't cause any refcount problems IMO
You say GetPixels
is misbehaving, on what class?
I think I found it...
m68: (https://github.com/mono/skia/blob/xamarin-mobile-bindings/src/c/sk_types_priv.h#L193) ref on the way in
sk_ref_sp(AsColorSpace(info->colorspace)));
m80: (https://github.com/mono/skia/blob/dev/update-m80/src/c/sk_types_priv.h#L210) NO REF on the way in
sk_sp<SkColorSpace>(AsColorSpace(info->colorspace)));
Let me fix that. What is happening is that when I get the struct from native code, the cs is incremented. When it reaches managed code, it is decremented. So no problem. However, when it goes back into the native code, it is not incremented, so when the native code cleans up, it gets decremented.
Fixed in 1590d7e1e81581e1c4ab1252b7d3dcb563d1a8e1 and https://github.com/mono/skia/commit/47de7965be076311e824589b0464a4e1e08d54eb
I am trying to run the units tests in the
dev/update-skia
branch, so I can continue working on PRs, but mydotnet test
run always fails randomly, with native exceptions (access violations). Sometimes a dozens of tests succeed before crashing, sometimes upto abour 100, so this smells like a multi threading issues. I have 32 hyperthreads (AMD 3950x) all running at the same time, so this might reveal race conditions.Is this a known issue?
I have build both the native and managed code successfully, on Windows 10, using the latest version of VS2019.