oleg-st / ZstdSharp

Port of zstd compression library to c#
MIT License
217 stars 29 forks source link

Heap corruption in release build when decompressing thousands of times #13

Closed xPaw closed 1 year ago

xPaw commented 1 year ago

See https://github.com/SteamDatabase/ValveResourceFormat/issues/454 for context.

I took Sandbox in this repo and put this code in:

        static void Main(string[] args)
        {
            var src = File.ReadAllBytes("test.zst");
            var i = 0;

            while (true)
            {
                var decompressor = new Decompressor();

                var outdec = new Span<byte>(new byte[1555065]);

                decompressor.TryUnwrap(src, outdec, out var written);
                //var uncompressed = decompressor.Unwrap(src);

                if (i++ % 1000 == 0)
                {
                    Console.WriteLine($"{i}");
                }
            }
        }

File: https://s.xpaw.me/zstdsharp_crash.zst

Some runs it takes just 1000 loops to crash, others it takes 300000 loops, it's not deterministic.

I don't think the file particularly matters because we've been getting crashes non deterministically. I just took one random file out.

On my machine decompressor.Unwrap(src); only crashes when multithreading, but on other users machine it crashes in a single thread too.

In event viewer I see the following:

Faulting application name: Sandbox.exe, version: 1.0.0.0, time stamp: 0x63558578
Faulting module name: ntdll.dll, version: 10.0.22621.755, time stamp: 0x8a328c67
Exception code: 0xc0000374
Fault offset: 0x000000000010c249
Faulting process id: 0x0x4F80
Faulting application start time: 0x0x1D8F78F09C1359F
Faulting application path: D:\GitHub\ZstdSharp\src\Sandbox\bin\Release\net6.0\Sandbox.exe
Faulting module path: C:\WINDOWS\SYSTEM32\ntdll.dll
Report Id: c2b3fde0-4654-40e8-995a-1c179d60bda3
Faulting package full name: 
Faulting package-relative application ID: 
Application: Sandbox.exe
CoreCLR Version: 6.0.1122.52304
.NET Version: 6.0.11
Description: The process was terminated due to an internal error in the .NET Runtime at IP 00007FFA4FD6237B (00007FFA4FB90000) with exit code c0000005.

Sometimes it throws an access violation in random parts of decompress code.

image

I have had other users run that code on their machines, and they get crashes too. Both AMD and Intel cpus crash.

I tested master, 0.6.4 and 0.3.0 versions.

kristiker commented 1 year ago

Can confirm, release only. image

oleg-st commented 1 year ago

@xPaw I think it's related to the garbage collector. It's looks as use after free. You can add decompressor.Dispose() to your cycle or reuse the same decompressor. I'll take a deeper look into this case.

xPaw commented 1 year ago

I think it's related to the garbage collector.

It's plausible because in my code adding GC.Collect did seem to fix the crash. Also indeed my mistake not disposing the decompressor (VS didn't want me for me for it).

oleg-st commented 1 year ago

Decompressor's finalizer is called during decompression. If decompression is the last method called and no other access to decompressor's object. We need to protect the decompressor object from being garbage collected.

xPaw commented 1 year ago

Decompressor's finalizer is called during decompression.

You think it's being called on the current object? I don't understand why that would happen.

oleg-st commented 1 year ago

@xPaw Yeah, I reproduced it with this sample:

works in release only

internal unsafe class TestClass : IDisposable
{
    private readonly int* _mem;

    public TestClass()
    {
        _mem = (int*) Marshal.AllocHGlobal(sizeof(int));
        *_mem = 0;
    }

    public bool Method()
    {
        Calc(_mem);
        return true;
    }

    // work with memory
    private void Calc(int* mem)
    {
        for (int i = 0; i < 1000; i++)
        {
            if (*mem == -1)
            {
                // mem has been freed during work
                Console.WriteLine("Freed");
            }
        }
    }

    private void ReleaseUnmanagedResources()
    {
        *_mem = -1;
        Marshal.FreeHGlobal((IntPtr) _mem);
    }

    public void Dispose()
    {
        ReleaseUnmanagedResources();
        GC.SuppressFinalize(this);
    }

    ~TestClass()
    {
        ReleaseUnmanagedResources();
    }
}
class Program
{
    static void Main(string[] args)
    {
        var i = 0;

        while (true)
        {
            var testClass = new TestClass();
            // alloc some memory
            var buffer = new Span<byte>(new byte[1555065]);
            testClass.Method();

            if (i++ % 1000 == 0)
            {
                Console.WriteLine($"{i}");
            }
        }
    }
}
xPaw commented 1 year ago

Is there a good reason that requires finalizer, shouldn't dispose be enough here?

Doesn't this sound like a .NET issue?

EDIT: I can confirm your example prints Freed in release build.

oleg-st commented 1 year ago

Without finalizer there is be a memory leak without dispose called.

.NET have some solutions to manage with issues like this (for P/Invoke) for IntPtr's such as SafeHandle, HandleRef etc. But I still don't understand how to use them correctly here.

xPaw commented 1 year ago

Perhaps NativeMemory in net6+ could be interesting? https://learn.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.nativememory

oleg-st commented 1 year ago

seems like GC.KeepAlive can do the job it makes fake access to the object to make it live longer.

like this:

    public bool Method()
    {
        Calc(_mem);
        GC.KeepAlive(this);
        return true;
    }
xPaw commented 1 year ago

That should do the trick. For reference: https://devblogs.microsoft.com/oldnewthing/20100813-00/?p=13153

oleg-st commented 1 year ago

Fixed in 0.6.5 Thank you @xPaw

xPaw commented 1 year ago

Thanks!