microsoft / FFmpegInterop

This is a code sample to make it easier to use FFmpeg in Windows applications.
Apache License 2.0
1.3k stars 310 forks source link

Changing audio/video properties mid stream #45

Open nickkelsey opened 9 years ago

nickkelsey commented 9 years ago

Sometimes digital TV from an antenna (live or recorded) will change key properties mid stream.

Audio might switch from 5.1 to 2 channel between a show and advertising, or the video might switch from 1080i (1920x1080x30fps) to 720p (1280x720x60fps) between shows.

The audio path uses swr_convert() to convert from floating point to S16 samples - this API segfaults internally when audio changes from 5.1 to 2 channel. I fixed the interop code to reinitialize the swr_convert context to 2/5.1 channel as needed - this fixes the segfault but now I need to tell the MediaFoundation layer that the PCM data I am passing it is now only 2 channel without interrupting playback.

Can you suggest a way to update the MediaFoundation properties mid-stream without interrupting playback?

(Currently focused on the audio change. Guessing the video size/framerate changing mid stream will have a similar solution).

Nick

nograx commented 8 years ago

Any news on this?

reego-fr commented 8 years ago

Hi, I don't think the channel count can be changed during playback on the MediaStreamSource api. Can @khouzam confirm ? If it is true I see two workarounds :

reego-fr commented 8 years ago

I fixed it using ffmpeg resampler but I think the first option would be the only way to do it when you don't know how many channels to expect when starting playback.

nickkelsey commented 8 years ago

Confirming, a reasonable work around for FFmpeg decoded audio is to detect the number of audio channels the defaulter audio render has (ie how many actual speakers there are), then use the FFmpeg resampler to convert the number of channels. This trick doesn't work when using the a Media Framework codec like AAC and resampling video is just nasty... still needs a real fix.

reego-fr commented 8 years ago

Thanks for the feedback, I didn't know about related issues with aac, but I found AC3 decoder from Media Framework was behaving pretty much like the resampler fix. I forgot you mentioned video as well ! Nasty is just right :-)

oliversluke commented 8 years ago

@reego-fr - I like your proposal. Can you share the code you are using to leverage ffmpeg's resampler?

nograx commented 8 years ago

@reego-fr - A code example would be nice. Thanks.

reego-fr commented 8 years ago

Actually, ffmpeg resampler is already used in the project. The only thing that needs to be changed is how it is configured and used in order to avoid the output sample format update mid stream. I am not happy with the fix I have for now but it could be enhanced with @nickkelsey idea of getting a channel count from the default audio renderer. I am not sure how to get it but it is obviously better than hardcoding the output format.

reego-fr commented 8 years ago

Regarding video, I don't know if it is common to see stream properties change during broadcasting but if it is I think the workaround would be to stop playback and force the application to create a new media stream source. From FFmpegInterop it means only detecting this is happening, stopping playback and giving the consuming application a way to resume playback at a given timestamp using proper stream properties.

oliversluke commented 8 years ago

@nickkelsey @reego-fr - Thanks for your idea with the swr_convert() approach. It works now also for me. This way I can manage at least to play my MPEGTS files without an segfaults.

Knowing that the hardcoding of the output is not an ideal solution. However, better than nothing.

nograx commented 8 years ago

@OliverS79, could you share the code?

oliversluke commented 8 years ago

@nograx. sure - The main idea is to adjust DevodeAVPacket in UncompressedAudioSampleProvider

Here is the main code which should show you the idea. I also made a small change in CreateAudioStreamDescriptor in FFmpegInteropMSS.cpp to setup the audio descriptor always with the FIX_OUTPUTCHANNELS.

@reego-fr - have you found a better way to achieve this? Knowing all your other improvement, I am sure you found a more elegant solution than me.

    if (SUCCEEDED(hr) && frameComplete)
    {
        //Check for changed input Channel Numner; if so reinitiate m_pSwrCtx
        if (m_pAvFrame->channels != lastNumberOfChannels)
        {
            int64 inChannelLayout = m_pAvCodecCtx->channel_layout ? m_pAvCodecCtx->channel_layout : av_get_default_channel_layout(m_pAvCodecCtx->channels);
            int64 outChannelLayout = av_get_default_channel_layout(FIX_OUTPUTCHANNELS);

            m_pSwrCtx = swr_alloc_set_opts(
                NULL,
                outChannelLayout,
                AV_SAMPLE_FMT_S16,
                m_pAvCodecCtx->sample_rate,
                inChannelLayout,
                m_pAvCodecCtx->sample_fmt,
                m_pAvCodecCtx->sample_rate,
                0,
                NULL);

            if (swr_init(m_pSwrCtx) < 0)
            {
                DebugMessage(L"Error reinitializing SwrCtx!\n");
                hr = E_FAIL;
                break; // Skip broken frame
            }

            lastNumberOfChannels = m_pAvCodecCtx->channels;
        }
        //End Check for changed input Channel Numner;

        // Resample uncompressed frame to AV_SAMPLE_FMT_S16 PCM format that is expected by Media Element
        uint8_t *resampledData = nullptr;
        //unsigned int aBufferSize = av_samples_alloc(&resampledData, NULL, m_pAvFrame->channels, m_pAvFrame->nb_samples, AV_SAMPLE_FMT_S16, 0);
        unsigned int aBufferSize = av_samples_alloc(&resampledData, NULL, FIX_OUTPUTCHANNELS, m_pAvFrame->nb_samples, AV_SAMPLE_FMT_S16, 0);
        int resampledDataSize = swr_convert(m_pSwrCtx, &resampledData, aBufferSize, (const uint8_t **)m_pAvFrame->extended_data, m_pAvFrame->nb_samples);
        //auto aBuffer = ref new Platform::Array<uint8_t>(resampledData, min(aBufferSize, (unsigned int)(resampledDataSize * m_pAvFrame->channels * av_get_bytes_per_sample(AV_SAMPLE_FMT_S16))));
        auto aBuffer = ref new Platform::Array<uint8_t>(resampledData, min(aBufferSize, (unsigned int)(resampledDataSize * FIX_OUTPUTCHANNELS * av_get_bytes_per_sample(AV_SAMPLE_FMT_S16))));
        dataWriter->WriteBytes(aBuffer);
        av_freep(&resampledData);
        av_frame_unref(m_pAvFrame);
nograx commented 8 years ago

Sorry, can you show your changes to FFmpegInteropMSS.cpp also? Big thanks!

oliversluke commented 8 years ago

@nograx In FFmpegInteropMSS.cpp I made changed the following line in method HRESULT FFmpegInteropMSS::CreateAudioStreamDescriptor(bool forceAudioDecode)

    //audioStreamDescriptor = ref new AudioStreamDescriptor(AudioEncodingProperties::CreatePcm(avAudioCodecCtx->sample_rate, avAudioCodecCtx->channels, bitsPerSample));
    audioStreamDescriptor = ref new AudioStreamDescriptor(AudioEncodingProperties::CreatePcm(avAudioCodecCtx->sample_rate, FIX_OUTPUTCHANNELS, bitsPerSample));
nickkelsey commented 8 years ago

@OliverS79 Suggest detecting the number of audio channels the defaulter audio render has (ie how many actual speakers there are), then using that number to resample.

oliversluke commented 8 years ago

@nickkelsey Yes, this would be a good approach. Do you know how to detect this in a UWP app? Do you have same sample code for the detection or a link to a website which explains it?

As a workaround maybe it would be safe to assume for a Win 10 Mobile app 2 channels and for a UWP Desktop app 6 channels? If the desktop environment has less than 6 channels Windows will map it to the smaller number of channels automatically.

This should work for the most setups. I am not even sure if there is any live TV content (MPEGTS stream) with more than 6 channels.

Let me know your thoughts on this.

nograx commented 8 years ago

Seems to work in a short test.

khouzam commented 8 years ago

I'm trying to track down the right expert to answer this question.

reego-fr commented 8 years ago

@OliverS79 I believe your code is more or less the same as mine except that I kept the starting channel count rather than specifying one, but as said above this is not ideal either.

reego-fr commented 8 years ago

Now that I think again about it, I am concerned about converting 2.0 to 5.1 when the user has a 5.1 setup because it will probably prevent speaker fill to work. If you still want to do it this way, you can maybe get the channel count from Windows::Devices::Enumeration namespace. I didn't check for available properties it must be worth trying.

nograx commented 7 years ago

After I fixed my problems of debugging the code I found out that my problem is solved if I force to use the first available channel count instead of using the m_pAvFrame->channels.

oliversluke commented 7 years ago

@khouzam - just checking if you found an expert who could help with this? Can we change the audio descriptor of the mss mid-stream? E.g. by adding a new audio streaming and removing the existing one from the mss?

lukasf commented 7 years ago

I know that this is old, but has there been any progress in updating the audio descriptor?

When reading the docs of AudioStreamDescriptor, I think that this scenario should be supported: "The application can change the encoding properties of the audio stream descriptor at any time. If the media pipeline cannot handle the new encoding properties, the MediaStreamSource will raise the Closed event which provides information regarding the error." So instead of replacing the stream descriptor itself (which is not possible in MSS), we change the encoding properties of the existing descriptor. The AudioStreamDescriptor.EncodingProperties property ise read-only but the individual properties of AudioEncodingProperties are get/set so they can be changed. Maybe upon SampleRequested, when we detect a format change, it is possible to update the AudioEncodingProperties of the existing stream descriptor (and set Discontinuous flag on the Sample) before returning the sample.

The underlying MF pipeline definitely supports stream descriptor changes and the docs seem to indicate that MediaStreamSource supports this as well (without exactly saying how it works, though).

reego-fr commented 7 years ago

Thanks @lukasf, I don't know if it was supported when I experimented on that but it looks promising ! Good catch !