CanvasTexture problem with VideoFrames

stevexbritton commented 3 months ago

Description

When using a CanvasTexture for a Scene background it works when the CanvasTexture is created from an ImageBitmap, but not when it's created from a VideoFrame. The error "GL_INVALID_VALUE: Offset overflows texture dimensions." is reported when "renderer.render()" is called.

Reproduction steps

This needs to be run using Chrome
Run the accompanying codepen with the "...imageCapture.grabFrame..." code fragment NOT commented out and the camera images from the "video" element ARE rendered to the canvas and the "dst-video" element.
Comment out the "...imageCapture.grabFrame..." code fragment and comment in the "const newFrame = transform(frame, frame.displayWidth, frame.displayHeight)" code fragment and the camera images from the "video" element ARE NOT rendered to the canvas nor the "dst-video" element.
"GL_INVALID_VALUE: Offset overflows texture dimensions." errors occur, but unfortunately these are not displayed in Codepen's condole

Code

<!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <title>Vyking Video</title>
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0">

    <script type="importmap">
        {
          "imports": {
            "three": "https://cdn.jsdelivr.net/npm/three@0.166.1/build/three.module.js"
          }
        }
    </script>
</head>

<body>
    <video id="video" controls></video>
    <canvas id="canvas"></canvas>
    <video id="dst-video" controls></video>

    <script type="module">
        import * as THREE from 'three'

        document.addEventListener('DOMContentLoaded', async () => {
            const video = document.querySelector('#video')
            const canvas = document.querySelector('#canvas')
            const dstVideo = document.querySelector('#dst-video')

            if (video instanceof HTMLVideoElement && canvas instanceof HTMLCanvasElement) {
                const scene = new THREE.Scene()
                const cameraRig = new THREE.Group()
                const camera = new THREE.PerspectiveCamera(45, window.innerWidth / window.innerHeight, 0.1, 10)
                cameraRig.name = 'cameraRig'
                cameraRig.rotateX(-Math.PI / 2) // Point camera down at your feet
                cameraRig.add(camera)
                scene.add(cameraRig)
                const renderer = new THREE.WebGLRenderer({
                    canvas: canvas,
                    alpha: false,
                    powerPreference: 'default',
                    preserveDrawingBuffer: true
                })
                renderer.setPixelRatio(window.devicePixelRatio)
                renderer.autoClear = false
                renderer.debug = {
                    checkShaderErrors: false,
                    onShaderError: null
                }
                renderer.outputColorSpace = THREE.SRGBColorSpace

                const srcMediaStream = await navigator.mediaDevices.getUserMedia({
                    audio: false,
                    video: {
                        width: { ideal: 960 },
                        height: { ideal: 540 },
                        frameRate: { max: 30 }
                    },
                })
                const srcVideoTrack = srcMediaStream.getVideoTracks()[0]

                const transform = (frame, width, height) => {
                    // console.log(`transform frame: %o srcVideoTrack.enabled: ${srcVideoTrack.enabled}, ${width} ${height}`)

                    srcVideoTrack.enabled = sinkVideoTrack.enabled
                    if (!sinkVideoTrack.enabled) {
                        return new VideoFrame(frame)
                    }

                    if (!(scene.background instanceof THREE.CanvasTexture)) {
                        scene.background = new THREE.CanvasTexture(frame)
                        scene.background.colorSpace = THREE.SRGBColorSpace
                        scene.background.generateMipmaps = false
                        scene.background.minFilter = THREE.LinearFilter
                        scene.background.matrixAutoUpdate = false
                    } else {
                        if (scene.background.image.width != width || scene.background.image.height != height) {
                            scene.background.image.close()
                            scene.background.dispose()
                            scene.background = new THREE.CanvasTexture(frame)
                            scene.background.colorSpace = THREE.SRGBColorSpace
                            scene.background.generateMipmaps = false
                            scene.background.minFilter = THREE.LinearFilter

                            scene.background.matrixAutoUpdate = false
                        } else {
                            scene.background.image.close()
                            scene.background.image = frame
                            scene.background.needsUpdate = true
                        }
                    }
                    renderer.setSize(width, height)
                    renderer.render(scene, camera)

                    return new VideoFrame(canvas, {
                        timestamp: 0
                    })
                }
                const trackProcessor = new MediaStreamTrackProcessor({ track: srcVideoTrack })
                const trackGenerator = new MediaStreamTrackGenerator({ kind: 'video' })
                const imageCapture = new ImageCapture(srcVideoTrack)
                const transformer = new TransformStream({
                    async transform(frame, controller) {
                        /*vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv*/

                        const newFrame = await imageCapture.grabFrame().then(image => {
                            frame.close()

                            return transform(image, image.width, image.height)
                        })

                        // const newFrame = transform(frame, frame.displayWidth, frame.displayHeight)

                        /*^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^*/

                        controller.enqueue(newFrame)
                    },
                })

                const sinkMediaStream = new MediaStream([trackGenerator])
                const sinkVideoTrack = sinkMediaStream.getVideoTracks()[0]

                trackProcessor
                    .readable
                    .pipeThrough(transformer)
                    .pipeTo(trackGenerator.writable)
                    .catch(cause => {
                        console.error(`VykingMediaDevices.getUserMedia error: %o`, cause)
                    })

                dstVideo.srcObject = sinkMediaStream
                video.srcObject = srcMediaStream

                video.play()
                dstVideo.play()
            }
        })
    </script>
</body>

</html>

Live example

jsfiddle-latest-release

Screenshots

No response

Version

1.66.1

Device

Desktop

Browser

Chrome

OS

MacOS

Mugen87 commented 3 months ago

I've demonstrated with a more simplified example that using a video frame with a THREE.CanvasTexture works as expected: https://jsfiddle.net/3k1q0fez/

So there must be an issue in your app level code. Please use the forum or stackoverflow to search for the root cause. If it turns out to be an issue in the engine, we can reopen the issue.

stevexbritton commented 3 months ago

Hi, thank you for such a quick response, unfortunately your simplified example is too naive, so I have taken the liberty of making a small modification to your jsfiddle to demonstrate the problem. I have replaced your VideoFrame creation code with the example code provided in the MDN documentation (https://developer.mozilla.org/en-US/docs/Web/API/VideoFrame/VideoFrame) and CRUCIALLY added the setting of the displayWidth and displayHeight properties with values different to codedWidth and codeHeight values, a much more likely scenario when working with real VideoFrames. You will now see, when you run your jsfiddle, that the code no longer works. Also, if I change my example to ask for a camera with video dimensions that do not require cropping the source image (1280x720 for my Mac 16") my code works correctly. I think this demonstrate there is no app level issue with my code and the issue is with Three not correctly handling VideoFrames where the display size is different to the coded size. I hope this is enough evidence for you to re-open this issue. Thank you

Mugen87 commented 3 months ago

Do you mind sharing the updated fiddle?

stevexbritton commented 3 months ago

I'm sorry, I haven't used jsfiddle before, is this what you want: https://jsfiddle.net/5jbs3oaq/13/

Mugen87 commented 3 months ago

So the root cause is that the coded width/height differs from the display width/height.

This totally explains the issue of course since the dimensions of the texture and its buffer size do not match.

Would it be correct to always use codedWidth and codedHeight? The bit that handles the dimensions for video frames looks like so right now:

https://github.com/mrdoob/three.js/blob/f5eaae88a09fce661a814d2bd3526f1690302bb6/src/renderers/webgl/WebGLTextures.js#L2054-L2058

stevexbritton commented 3 months ago

My understanding is that when you ask for a navigator.mediaDevices.getUserMedia video of a certain dimension the camera returns images of a certain size, the VideoFrame's codeWidth & codeHeight, and this may be cropped down to the requested size which is the VideoFrame's displaySize & displayWidth. So the texture size needs to be the displayWidth & displayHeight but the data to copy is a window within the VideoFrame data, which I believe is defined by the VideoFrame's visibleRect property.

Mugen87 commented 3 months ago

Would it be possible to extract the effective frame data on app level based on visibleRect and put the data into a buffer for a data texture? The dimensions of the data texture would be displayWidth and displayHeight.

If this works, we maybe can try to integrate this into the renderer.

stevexbritton commented 3 months ago

I believe VideoFrames like ImageBitmaps just hold references to the data, which can be passed to and stored directly in the GPU and the VideoFrame data coming from video devices is indeed held in the GPU. Therefore, copying to the CPU to create a DataTexture would be slow. Is the data copied from the GPU to the CPU when creating a CanvasTexture from an ImageBitmap?

Mugen87 commented 3 months ago

Is the data copied from the GPU to the CPU when creating a CanvasTexture from an ImageBitmap?

No since the image bitmap data are already on the CPU side. This should also true for video frames, imo.

stevexbritton commented 3 months ago

Sorry, did you mean GPU not CPU. I'm saying the data is already on the GPU side and I'm hoping it doesn't have to be copied to the CPU to create the texture only to be copied backup to the GPU.

mrdoob / three.js