microsoft / FFmpegInterop

This is a code sample to make it easier to use FFmpeg in Windows applications.
Apache License 2.0
1.28k stars 310 forks source link

Some ideas about improve the 4k video software decoding performance. #218

Open MouriNaruto opened 6 years ago

MouriNaruto commented 6 years ago

I think I should open an issue instead of reply to an issue. So I opened.

I have tested the @lukasf 's work in three days ago. But find its performance is same as the master branch's. So I have to try to optimize it by myself and I found something works.

First, the BufferTime property of MediaStreamSource should set to 0 to disable buffering. It can extremely significant reduce the memory usage. (From 10GB+ to 700MB, in my 4k 120fps avc 8 bit video.)

Second, I use the performance profiler in visual studio and find the DataWriter object in the UncompressedVideoSampleProvider class eats 10% of my cpu usage. So I have written a class to convert pointer to IBuffer directly.

class BufferReference : public RuntimeClass<
    RuntimeClassFlags<RuntimeClassType::WinRtClassicComMix>,
    abi_IBuffer,
    abi_IBufferByteAccess>
{
private:
    UINT32 m_Capacity;
    UINT32 m_Length;
    byte* m_Pointer;

public:
    virtual ~BufferReference() throw()
    {
    }

    STDMETHODIMP RuntimeClassInitialize(
        byte* Pointer, UINT32 Capacity) throw()
    {
        m_Capacity = Capacity;
        m_Length = Capacity;
        m_Pointer = Pointer;
        return S_OK;
    }

    // IBufferByteAccess::Buffer
    STDMETHODIMP Buffer(byte **value) throw()
    {
        *value = m_Pointer;
        return S_OK;
    }

    // IBuffer::get_Capacity
    STDMETHODIMP get_Capacity(UINT32 *value) throw()
    {
        *value = m_Capacity;
        return S_OK;
    }

    // IBuffer::get_Length
    STDMETHODIMP get_Length(UINT32 *value) throw()
    {
        *value = m_Length;
        return S_OK;
    }

    // IBuffer::put_Length
    STDMETHODIMP put_Length(UINT32 value) throw()
    {
        if (value > m_Capacity)
            return E_INVALIDARG;
        m_Length = value;
        return S_OK;
    }
};

// The M2MakeIBuffer function retrieves the IBuffer object from the provided 
// raw pointer.
//
// Parameters:
//
// Pointer
//     The raw pointer you want to retrieve the IBuffer object.
// Capacity
//     The size of raw pointer you want to retrieve the IBuffer object.
//
// Return value:
//
// If the function succeeds, the return value is the IBuffer object from the 
// provided raw pointer. If the function fails, the return value is nullptr.
//
// Warning: 
// The lifetime of the returned IBuffer object is controlled by the lifetime of
// the raw pointer that's passed to this method. When the raw pointer has been 
// released, the IBuffer object becomes invalid and must not be used.
IBuffer^ M2MakeIBuffer(byte* Pointer, UINT32 Capacity) throw()
{
    IBuffer^ buffer = nullptr;

    ComPtr<BufferReference> bufferReference;
    if (SUCCEEDED(MakeAndInitialize<BufferReference>(
        &bufferReference, Pointer, Capacity)))
    {
        buffer = reinterpret_cast<IBuffer^>(bufferReference.Get());
    }

    return buffer;
}

Then I found the maximum fps from 70 to 100. (The goal fps value is 120, just like PotPlayer on my machine.)

I hope someone find a better way to solve, this is the reason why I share it.

I hope it can help you.

Mouri

brabebhin commented 6 years ago

Nice. I supposed this convertor class can also be used for audio buffers. I will test ASAP.

lukasf commented 6 years ago

@MouriNaruto Thank you. But we already have indentified and addressed that problem. Please check my branch lukasf/direct-buffer-clean. It contains NativeBuffer classes which work similar to what you posted here, with the addition that it will also automatically free the referenced native buffer when the IBuffer gets destroyed. This has greatly improved decoding performance for 4K files.

@mcosmin222 I am already using that approach for audio decoding as well. It requires the refactoring we have been talking about. I have done it and now audio decoding goes zero-copy as well for most formats.

brabebhin commented 6 years ago

I am guessing you have not made a PR yet.

lukasf commented 6 years ago

I sure have #216

brabebhin commented 6 years ago

Strange i don't see any new commits.

lukasf commented 6 years ago

The initial commit already had all this. Look for NativeBuffer files.

brabebhin commented 6 years ago

Anyway, speaking of refactoring, maybe we should make a parallel repo and merge all our changes on it and when we have it bug free (reasonably) we can propose to replace this official one. What say you?

lukasf commented 6 years ago

Good idea, and actually this is what I have been doing the last days. Will post more info very soon!

brabebhin commented 6 years ago

I have moved house recently and I have not been very active on the coding side but i will be ready soon enough :)

MouriNaruto commented 6 years ago

@lukasf @mcosmin222 Thank you.

I am sorry, I used the wrong branch, the lukasf/yuv-output (I think number biggest is the newest lol.). So I used it and it doesn't improve any performance. (Sorry, everyone.)

Two adjustments I said can make the maximum fps value of my 4k 120fps avc-8bit video stream with flac audio stream video from 70 to 100. But sadly it can't reach the 120fps like the PotPlayer. I want to know how to continue reducing the call overhead or find the better way to use FFmpeg.

Update: @lukasf I used your latest work. Its performance is same as mine but used more memory. (100MB more than mine.)

But I can understand, because I removed all hardware decodecs implementations and it makes me can make another optimize. (I create the IBuffer object after initializing software scaler, so I can always use a IBuffer object and reduce the IBuffer creation overhead.) PS: The change I said is based on the master branch and I try to modify yours but find your way is different than the master branch's. (I have never used FFmpeg directly in my work. So I don't know how to show my idea on your works, sorry.)

Also I found if I set m_isEnabled = true in MediaSampleProvider::DisableStream method, I can make the video loop, not the same as it is now.

Mouri