microsoft / BotBuilder-RealTimeMediaCalling

BotBuilder-RealTimeMediaCalling extends the BotBuilder to enable bots to engage in Skype audio-video calling. It provides real-time, programmable access to the voice, video, and screen sharing streams of a Skype call. The bot is a direct participant in a Skype 1:1 call.
MIT License
76 stars 36 forks source link

Sample audio and video files for AudioVideoPlayerBot #28

Closed robmeadows closed 7 years ago

robmeadows commented 7 years ago

Can you please provide a working example of an NV12 YUV video file and PCM16K wav file for the AudioVideoPlayerBot Sample? I've not been able to create a video that is not distorted, or audio that is good quality. I'm using the following ffmpeg commands:

ffmpeg -i input.mov -s 640x360 -pix_fmt nv12 -framerate 15 output.yuv
ffmpeg -i input.mov -acodec pcm_s16le -ar 8K -ab 16K output.wav

Is there a recommended tool or approach for converting modern video formats such as mov/mp4 to the required Real Time Media formats? Or is there a way to provide H264 to the AudioVideoFramePlayer?

MalarGit commented 7 years ago

I tried your commands and they seem to work for me. You can open the .yuv using YuvToolKit from http://www.yuvtoolkit.com/ to verify if the video is fine. (Change the resolution, format and fps in the popup window) I just binged for some sample .mov file and found some at link http://www.steppublishers.com/sample-video. There is no distortion with audio or video.. Try and let us know how it goes.

AudioVideoFramePlayer does not support H264 currently.

robmeadows commented 7 years ago

What video formats are actually supported by AudioVideoFramePlayer? My videos (converted with the commands above) start to play, but then freeze after the first few frames. Also, the NV12 formatted files are extremely large.

waboum commented 7 years ago

Hey Rob,

To extract media from an MP4 or divx file, you can use ffmpeg and sox. For this example I have used a 720 mp4 file, a couple of useful commands:

  1. (optional) cut the video if too big: (here just 10 sec of content) ffmpeg -i input720p.mp4 -c copy -ss 00:00:10 -t 00:00:10 output.mp4

2- extract the audio tracks: ffmpeg -i output.mp4 outputToConvert.wav

3- convert the audio tracks to mono, most of the video samples have 5.1 audio (6 channels), to view it, I would recommend using Audacity and see how many tracks there is in your audio. in my case 6 so merging it to 1 channel:

sox outputToConvert.wav outputmono.wav remix 1-6

4-downsampling the audio to 16khz you can visualize the sampling rate in Audacity

sox outputmono.wav -r 16k audioOutput.wav

5-finally for the video portion: ffmpeg -i output.mp4 -f rawvideo -vcodec rawvideo -pix_fmt nv12 -s 1280x720 -r 30 rawvideo.yuv

This distortion might be coming for the original resolution from your input. As Malar mentioned, I would recommend to load the raw video with this tool http://www.yuvtoolkit.com/ to double check.

Thanks!

waboum commented 7 years ago

forgot to reply to this question: "What video formats are actually supported by AudioVideoFramePlayer? My videos (converted with the commands above) start to play, but then freeze after the first few frames. Also, the NV12 formatted files are extremely large."

Supported formats are documented here: https://docs.microsoft.com/en-us/bot-framework/dotnet/bot-builder-dotnet-real-time-media-concepts, since it is raw media the size will definitely be big, H264 is not yet supported. If the file you are trying to push is very large, more than 2GB, you can wait for the LowOnFrames event to enqueue the next portion of the video instead of enqueuing everything at the beginning. For your question regarding the freeze, please make sure that the timestamps for the video are correct and also try to view the raw media you are streaming with http://www.yuvtoolkit.com/

robmeadows commented 7 years ago

Thanks - that was exactly what I needed for the audio and video formatting. Your approach produced working files for me now.

The videos are still getting cut off after a couple seconds, but I don't think it is related to the video formatting. It looks like no matter the length of the video, only 12 MediaBuffers are getting created. Is it possible that stream read is returning less than frameSize bytes here even if not at the end of the file?

while (fs.Read(bytesToRead, 0, bytesToRead.Length) >= frameSize)

The Stream.Read documentation says "An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached." I'm using a CloudFile, so maybe there's some buffering delays?

waboum commented 7 years ago

Thanks I fixed the link. Regarding your question about not having enough buffers to stream. Is it possible for you to pre-download the file and retry? Later, you can always try a more optimized approach where you could bufferize the download from your source to make sure you are never going to run out of buffers to stream. It is also very important to make sure that the time stamps in the video buffers are accurate, this is used by the AudioFramePlayer to ensure Audio Video sync. I would recommend you to always make sure that you have in queue at least 10 sec worth of audio/video buffers. For that you can use the setting MinEnqueuedMediaLengthInMs in the AudioVideoFramePlayerSettings when you create the frame player, this way you will be notified by the LowFrames event when you have less than 10 sec worth of data.

robmeadows commented 7 years ago

Just to close the loop on this one, moving the files to a local storage location helped, but there are still scenarios where you will not read the entire requested amount of data in a Stream.Read call. I modified the read loop to look something like this and that guarantees the whole file gets read into buffers:

using (Stream fs = ...)
{
    byte[] bytesToRead = new byte[frameSize];
    int spot = 0;
    int bytesRead = 0;
    bytesRead = fs.Read(bytesToRead, spot, bytesToRead.Length - spot); // This can actually return less than frameSize bytes even if not at the end of the stream

    while ( bytesRead > 0) 
    {
        if (spot + bytesRead < bytesToRead.Length) // Didn't read a whole frame
        {
            spot = spot + bytesRead;
        }
        else
        {
            IntPtr unmanagedBuffer = Marshal.AllocHGlobal(frameSize);
            Marshal.Copy(bytesToRead, 0, unmanagedBuffer, frameSize);
            referenceTime += packetSizeInMs;
            var videoSendBuffer = new VideoSendBuffer(unmanagedBuffer, (uint)frameSize,
                videoFormat, referenceTime);
            videoMediaBuffers.Add(videoSendBuffer);
            spot = 0;
        }

        bytesRead = fs.Read(bytesToRead, spot, bytesToRead.Length - spot);
    }
}

Hope this helps! -Rob