twilio / video-quickstart-android

Twilio Video Quickstart for Android
MIT License
214 stars 159 forks source link

How to Correctly Rotate VideoFrame #629

Closed cyodrew closed 3 years ago

cyodrew commented 3 years ago

Before filing an issue please check that the issue is not already addressed by the following:

Please ensure that you are not sharing any Personally Identifiable Information(PII) or sensitive account information (API keys, credentials, etc.) when reporting an issue.

Description

I'm attempting to pull frames off the camera prior to being adapted for WebRTC. Based on the docs, it seems the best way to do this is creating a custom VideoProcessor and obtain the frame in the overridden method onFrameCaptured(VideoFrame frame, VideoProcessor.FrameAdaptationParameters parameters). The video is being captured in a portrait orientation, so the frame has a rotation value of 270 or 90, depending on if it's the front or back camera. I'm feeding the frame to the Android encoder MediaCodec using YuvHelper.i420Rotate by copying it into the input buffer. After using MediaMuxer, the resulting MP4 appears wrong.

Video captured with MediaCodec width set to 1080 and height set to 1920.

output

Note: I hardcoded the height, width, and framerate here for brevity. If I use YuvHelper.i420Copy and use the unrotated width and height when setting up the MediaCodec, the video looks correct but is oriented horizontally (see below).

Video captured with MediaCodec width set to 1920 and height set to 1080.

rotated

Steps to Reproduce

  1. Set up a VideoProcessor
  2. Set up a MediaCodec for encoding
  3. Apply i420Rotate to the VideoFrame.i420Buffer
  4. Feed the input buffer to the MediaCodec
  5. Feed the output buffer to MediaMuxer
  6. Check the resulting MP4

Code

RecordingVideoProcessor.kt

import kotlinx.coroutines.flow.MutableSharedFlow
import kotlinx.coroutines.flow.asSharedFlow
import tvi.webrtc.VideoFrame
import tvi.webrtc.VideoProcessor
import tvi.webrtc.VideoSink

class RecordingVideoProcessor : VideoProcessor {
    var videoSink: VideoSink? = null
    private val _videoFrameFlow = MutableSharedFlow<VideoFrame>(extraBufferCapacity = 10)
    val videoFrameFlow = _videoFrameFlow.asSharedFlow()

    override fun onCapturerStarted(success: Boolean) {}

    override fun onCapturerStopped() {}

    override fun onFrameCaptured(videoFrame: VideoFrame?) {
        videoSink?.onFrame(videoFrame)
    }

    override fun onFrameCaptured(frame: VideoFrame?, parameters: VideoProcessor.FrameAdaptationParameters?) {
        frame?.let { videoFrame ->
            if (_videoFrameFlow.subscriptionCount.value > 0) {
                videoFrame.retain()
                if (!_videoFrameFlow.tryEmit(videoFrame)) {
                    videoFrame.release()
                }
            }

            super.onFrameCaptured(videoFrame, parameters)
        }
    }

    override fun setSink(sink: VideoSink?) {
        videoSink = sink
    }

MediaHandler.kt

import android.content.Context
import android.media.*
import android.util.Log
import kotlinx.coroutines.*
import kotlinx.coroutines.channels.Channel
import kotlinx.coroutines.flow.*
import tvi.webrtc.*
import java.lang.IllegalStateException
import java.util.*
import java.util.concurrent.atomic.AtomicBoolean

class MediaHandler(
        context: Context,
        private val externalScope: CoroutineScope
) {
    private var videoEncoderDone = CompletableDeferred<Unit>()
    private lateinit var encodeVideoJob: Job
    private val videoMediaFormat = MediaFormat().apply {
            setString(MediaFormat.KEY_MIME, MediaFormat.MIMETYPE_VIDEO_AVC)
            setInteger(MediaFormat.KEY_WIDTH, 1080) // hardcoded
            setInteger(MediaFormat.KEY_HEIGHT, 1920) // hardcoded
            setInteger(MediaFormat.KEY_FRAME_RATE, 30)
            setInteger(MediaFormat.KEY_BIT_RATE, 1080 * 1920 * 5)
            setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 1)
    }

    private val videoCodec by lazy {
        val encoder = MediaCodecList(MediaCodecList.REGULAR_CODECS).findEncoderForFormat(videoMediaFormat)
                ?: throw IllegalStateException("No matching codecs available on device")

        MediaCodec.createByCodecName(encoder).apply {
            videoMediaFormat.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420Flexible)
            configure(videoMediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
        }
    }

    private val endOfStream = AtomicBoolean(false)
    private val pendingVideoEncoderInputBufferIndicesChannel = Channel<Int>(capacity = Channel.BUFFERED)
    private val localFileName = "${UUID.randomUUID()}.mp4"
    private val filePath = "${context.filesDir}/$localFileName"
    private var videoTrackIndex: Int? = null
    private lateinit var mediaMuxer: MediaMuxer

    private val videoEncoderProcessor = object : MediaCodec.Callback() {
        override fun onInputBufferAvailable(codec: MediaCodec, index: Int) {
            pendingVideoEncoderInputBufferIndicesChannel.offer(index)
        }

        override fun onOutputBufferAvailable(codec: MediaCodec, index: Int, info: MediaCodec.BufferInfo) {
            muxVideo(index, info)
        }

        override fun onError(codec: MediaCodec, e: MediaCodec.CodecException) {}

        override fun onOutputFormatChanged(codec: MediaCodec, format: MediaFormat) {
            mediaMuxer = MediaMuxer(filePath, MediaMuxer.OutputFormat.MUXER_OUTPUT_MPEG_4)
            videoTrackIndex = mediaMuxer.addTrack(format)
            mediaMuxer.start()
        }
    }

    init {
        videoCodec.apply {
            setCallback(videoEncoderProcessor)
            start()
        }
    }

    private fun waitForCompletion() = externalScope.launch {
        videoEncoderDone.await()

        videoCodec.stop()
        videoCodec.release()
        mediaMuxer.stop()
        mediaMuxer.release()
    }

    private fun muxVideo(index: Int, bufferInfo: MediaCodec.BufferInfo) {
        if (bufferInfo.flags and MediaCodec.BUFFER_FLAG_CODEC_CONFIG != 0) {
            videoCodec.releaseOutputBuffer(index, false)
            return
        }

        val encodedBuffer = try {
            videoCodec.getOutputBuffer(index)
        } catch (e: IllegalStateException) {
            return
        }

        if (bufferInfo.size != 0 && encodedBuffer != null) {
            videoTrackIndex?.let { mediaMuxer.writeSampleData(it, encodedBuffer, bufferInfo) }
        }

        videoCodec.releaseOutputBuffer(index, false)

        if (bufferInfo.flags and MediaCodec.BUFFER_FLAG_END_OF_STREAM != 0) {
            videoEncoderDone.complete(Unit)
        }
    }

    fun encodeVideo(flow: Flow<VideoFrame>) {
        encodeVideoJob = flow.onEach { frame ->
            if (endOfStream.get()) {
                frame.release()
                return@onEach
            }

            val i420Buffer = frame.buffer.toI420()
            frame.release()
            encode(sample = i420Buffer, rotation = frame.rotation, codec = videoCodec, pts = frame.timestampNs / 1000, availableIndex = pendingVideoEncoderInputBufferIndicesChannel.receive())
        }.launchIn(externalScope)
    }

    private fun encode(sample: VideoFrame.I420Buffer, rotation: Int, pts: Long, codec: MediaCodec, availableIndex: Int) {
        val size = sample.height * sample.width * 3 / 2
        val input = try {
            codec.getInputBuffer(availableIndex)
        } catch (e: IllegalStateException) {
            return
        }
        input?.apply {
//            YuvHelper.I420Copy(sample.dataY, sample.strideY, sample.dataU, sample.strideU, sample.dataV, sample.strideV, this, sample.width, sample.height)
            YuvHelper.I420Rotate(sample.dataY, sample.strideY, sample.dataU, sample.strideU, sample.dataV, sample.strideV, this, sample.width, sample.height, rotation)
            codec.queueInputBuffer(availableIndex, 0, size, pts, 0)
        }
        sample.release()
    }

    fun close() {
        endOfStream.set(true) // stop encoder from receiving data
        if (this::encodeVideoJob.isInitialized) {
            encodeVideoJob.cancel()
        }
        waitForCompletion() // launch job to await deferred value to be called
        queueEos() // launch jobs to wait for available input buffer and signal EoS
    }

    private fun queueEos() = externalScope.apply {
        launch {
            val index = pendingVideoEncoderInputBufferIndicesChannel.receive()
            videoCodec.queueInputBuffer(index, 0, 0, 0, MediaCodec.BUFFER_FLAG_END_OF_STREAM)
        }
    }
}

VideoActivity.kt (in the quickStartKotlin project)


class VideoActivity : AppCompatActivity() {
    private var mediaScope = CoroutineScope(lifecycleScope.coroutineContext + Dispatchers.Default)
    private lateinit var mediaHandler: MediaHandler
    private val videoProcessor = RecordingVideoProcessor()

    override fun onCreate(savedInstanceState: Bundle?) {
        // ...
        mediaHandler = MediaHandler(
                this,
                mediaScope
        )
    }

    override fun onResume() {
        // ... 
        localVideoTrack = if (localVideoTrack == null && checkPermissionForCameraAndMicrophone()) {
            createLocalVideoTrack(this,
                    true,
                    cameraCapturerCompat,
                    VideoFormat(VideoDimensions.HD_1080P_VIDEO_DIMENSIONS, videoFormat.frameRate)
            )
        } else {
            localVideoTrack
        }
        localVideoTrack?.videoSource?.setVideoProcessor(videoProcessor)
    }

    private fun createAudioAndVideoTracks() {
        // Share your microphone
        localAudioTrack = createLocalAudioTrack(this, true)

        // Share your camera
        localVideoTrack = createLocalVideoTrack(this,
                true,
                cameraCapturerCompat,
                VideoFormat(VideoDimensions.HD_1080P_VIDEO_DIMENSIONS, videoFormat.frameRate)
        )
    }

    private val roomListener = object : Room.Listener {
        override fun onConnected(room: Room) {
            // ...
            mediaHandler.encodeVideo(videoProcessor.videoFrameFlow)
        }

        override fun onDisconnected(room: Room, e: TwilioException?) {
            // ...
            mediaHandler.close()
        }
    }
}

Expected Behavior

The output file should look correct regardless of the front and back camera being used.

Actual Behavior

The output file does not look right.

Reproduces how Often

100%

Logs

// Log output when issue occurs

Versions

All relevant version information for issue.

Video Android SDK

6.2.1

Android API

30

Android Device

Pixel 3a

cyodrew commented 3 years ago

Any update on this?

Alton09 commented 3 years ago

Hey @cyodrew. I'll take a closer look at this issue soon and provide feedback. Sorry for the delay. Thanks!

Alton09 commented 3 years ago

Hey @cyodrew. I started taking a closer look at this one recently. Unfortunately, I can't give you a quick answer on how to resolve the issue at the moment. However, I will be focusing on this issue this sprint and will work on an example from scratch to take a crack it. More updates to come soon! Thanks for your patience.

cyodrew commented 3 years ago

@Alton09 Awesome, I appreciate the follow up. If there's a better way to do this, I'm open to that as well. I know using an input surface over input buffers is another way to use MediaCodec, but most examples use the Android Camera APIs directly for that.

Alton09 commented 3 years ago

@cyodrew I'm honestly not sure from the top of my head, I think your approach looks good overall. I'll let you know what I find out when digging into this a bit more soon.

Alton09 commented 3 years ago

@cyodrew Just to provide an update on this one. I've created the branch escalation/video-4836-rotate-video that creates a new example module called exampleRotateVideoFrames that attempts to recreate this use case. I'm not sure what I'm missing from your code example, but the mp4 that is created is completely off. Do you mind taking a look to see what I'm missing? I'd like to get a public branch that is able to reproduce the issue you are experiencing so I can focus on the frame rotation problem. It will also be a useful example fo other developers to learn from.

I expect that the YuvHelper.I420Rotate should achieve the desired rotation. Since Google recommends to use a Surface for better efficiency when reading raw video, an Image object can be retrieved from it. I wonder if there is an API that can rotate these Image instances 🤔

cyodrew commented 3 years ago

@Alton09 Excellent, I appreciate you taking a deeper look into this. No problem, in my original example, MediaHandler is hardcoding the width and height at 1080x1920 and the localVideoTrack is defaulting to 640x480 unless you pass the VideoFormat with the right dimensions.

All you need to do here is create a VideoFormat instance with the desired VideoDimensions and frame rate that you pass to both the localVideoTrack and MediaHandler constructor, which you can then use to set the MediaFormat correctly.

class MediaHandler(
        context: Context,
        private val videoFormat: VideoFormat,
        private val externalScope: CoroutineScope
) {
    private var videoEncoderDone = CompletableDeferred<Unit>()
    private lateinit var encodeVideoJob: Job
    private val videoMediaFormat = MediaFormat().apply {
        setString(MediaFormat.KEY_MIME, MediaFormat.MIMETYPE_VIDEO_AVC)
        setInteger(MediaFormat.KEY_WIDTH, videoFormat.dimensions.width)
        setInteger(MediaFormat.KEY_HEIGHT, videoFormat.dimensions.height)
        setInteger(MediaFormat.KEY_FRAME_RATE, 30)
        setInteger(MediaFormat.KEY_BIT_RATE, 1080 * 1920 * 5)
        setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 1)
    }
// ...

As you'll observe though, the i420Copy method seems to look correct except the orientation is wrong. If you swap the width and height in the MediaFormat and use i420Rotate, visually the output MP4 does not look correct.

I'd prefer to use Image here as well if such an API exists or is accessible, but I haven't found one yet. I think there is a way to get the eglBase from the video track through reflection which can be used to call eglBase.createSurface(videoCodec.createInputSurface()). Not particularly pretty or intuitive though, and not a preferred way to do it. With that method and a few other webrtc methods, I got a correct video but it occasionally seems to give a SIGSEGV crash at a lower-level so it's not very stable.

Alton09 commented 3 years ago

Thanks for those tips @cyodrew ! I am definitely seeing some artifacts in the recording. Are you seeing a green tint as well? Here's what it looks like when running it on a Pixel One XL:

https://user-images.githubusercontent.com/2661383/117883511-19460580-b271-11eb-9c3d-aaa12ad6284e.mp4

I'd prefer to use Image here as well if such an API exists or is accessible, but I haven't found one yet. I think there is a way to get the eglBase from the video track through reflection which can be used to call eglBase.createSurface(videoCodec.createInputSurface()). Not particularly pretty or intuitive though, and not a preferred way to do it. With that method and a few other webrtc methods, I got a correct video but it occasionally seems to give a SIGSEGV crash at a lower-level so it's not very stable.

Yeah that's not ideal. I'll sync up with a coworker on this one to get more ideas on a stable solution. Thanks for your patience!

cyodrew commented 3 years ago

@Alton09

Oddly enough, I hadn't tried using both i420Copy and i420Rotate together, but that seemed to work (for my device, which is a Pixel 3a). The result you got indicates it may not produce the right result for every device, so we may have to dig deeper. I have seen the green artifacts before, but I saw you added COLOR_FormatYUV420Flexible to the format, which is what I was going to suggest trying since most answers I found suggested that.

https://user-images.githubusercontent.com/45369760/117887412-a6438b80-b27e-11eb-9a91-7266a3bfcbc3.mp4

Yeah that's not ideal. I'll sync up with a coworker on this one to get more ideas on a stable solution. Thanks for your patience!

Thank you, I look forward to it!

Alton09 commented 3 years ago

Just tested on my Pixel 5 and it looks perfect like what you have seen on the Pixel 3 👍🏻

The result you got indicates it may not produce the right result for every device, so we may have to dig deeper

Yep agreed. I think the media format config (color and bitrate) needs to be set to specific values based on the encoder that is used on the device for video avc. We've seen issues with specific hardware encoders in the past when given the incorrect values for color and bitrate.

cyodrew commented 3 years ago

I think there are some utility functions built in to resolve that, but sounds like it's on the right track 👍

Alton09 commented 3 years ago

@cyodrew I haven't had luck getting the codecs to work on other devices yet by trying to change the supported color types on the MediaCodec configuration. Another thing to try is to compare the configuration WebRTC uses when creating codecs in the HardwareVideoEncoder class to see if we are missing anything.

cyodrew commented 3 years ago

Were you getting similar results like the video you previously posted that contained artifacts? That may be worth adding in to the initial configuration, there are a few things there that were not added in my original code.

Alton09 commented 3 years ago

Hey @cyodrew. I've done all I can for this issue at the moment and need to look at other ongoing issues. It appears the solution is almost there when using raw ByteBuffers from a custom VideoProcessor, with just some configuration changes that are needed to the MediaCodec to get it working properly on other OEMs. Again I think taking a look at the WebRTC APIs can be helpful for providing more insight on this. Also, I found a really helpful blog post that goes into detail about proper hardware MediaCodec configuration and may provide more insight here.

I think the preferred way to solve this is to provide a Surface to the MediaCodec (also recommended by Google) and avoid using the VideoProcessor. After syncing with a @aaalaniz on this he had a few more ideas to get this working. First off, to get the SurfaceTexture from the camera reflection can be used. The SurfaceTextureHelper has a getSurfaceTexture() method which can be used to retrieve the SurfaceTexture. It is referenced by the CameraCapturer and Camera2Capturer classes so reflection would be needed to get a reference to the SurfaceTextureHelper private member. Also, there are many examples in the Grafika repo that are performing similar features that you are trying to achieve here so definitely take a look there as well. The VideoEncoderCore class looks interesting and could be helpful.

Also, you may already be aware of this, but we do have a REST API for recording a video room. This may not meet your use case here but here it is just in case.

I'll close this issue for now, but please feel free to open tickets for any new issues as needed. Thanks for your patience and collaboration!

cyodrew commented 3 years ago

Hey @Alton09,

I appreciate all the help regarding this issue. I also believe using Surface method with MediaCodec would be the best option moving forward. I'm trying to understand is what to do with the SurfaceTexture (accessed through reflection as you suggested) as many of the examples in Grafika are a bit bloated and it appears much of the EGL related classes are created and referenced internally with WebRTC, so I'm not sure if I'll also need to use reflection to get those or if two instances can exist.

I'll keep the search up!

Alton09 commented 3 years ago

Happy to help @cyodrew ! Thank you and please do share any findings to help others that may be facing the same issue.