revast / dvj

Automatically exported from code.google.com/p/dvj
GNU General Public License v3.0
0 stars 0 forks source link

ffmpeg video transfer likely involves a format conversion #103

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Key observation: Commenting out glTexSubImage2d() in UpdateTexture()
results in a significant framerate improvement (30=>60 when playing two
instances of Verminator). Downloading an image to the GPU should be fast,
especially since we're using Pixel Buffer Objects...

According to this linked document...

http://http.download.nvidia.com/developer/Papers/2005/Fast_Texture_Transfers/Fas
t_Texture_Transfers.pdf

...the following must be true for texture downloads to the GPU to be optimal:

* Pixel sizes must always be 32 bits
* Pixel format must be BGRA

Let's now trace how a video is currently streamed to OpenGL in LGL:

* BufferBytes is initialied with PIX_FMT_RGB24 (See MaybeChangeVideo())
* An LGL_Video sets up its FrameRGB via avpicture_fill() with PIX_FMT_RGB24
(pointing at BufferRGBBack). (See MaybeChangeVideo()).
* SwsConvertContext is set via sws_getContext() with PIX_FMT_RGB24, and
SWS_FAST_BILINEAR, even though no scaling is necessary. (See
MaybeChangeVideo())
* An LGL_Video decodes its frame to FrameRGB/BufferBackRGB via sws_scale()
(See DecodeFrameToImageBuffer()).
* LGL_Video::LockImage() is called.
* A new LGL_Image is created, passing in RGB-format data at 3 bytes per
pixel, BufferBackRGB.
* Subsequent video updates call LGL_Image::UpdateTexture(), again with
RGB-format data at 3 bytes per pixel, BufferBackRGB.
* LGL_Image creation sets AlphaChannel to false, and then creates a
4-byte-per-pixel SufaceSDL, and then calls LoadSurfaceToTexture().
* A GL_RGB image is created, and then immediately populated from the
GL_RGBA SurfaceSDL.
* Subsequent calls to LGL_Image::UpdateTexture(), always with
3-byte-per-pixel data, result in glTexSubImage2D being called to upload
GL_RGB data.

Finally, here's how it SHOULD work, to be optimal:

* BufferBytes is initialied with PIX_FMT_BGR32_1 (See MaybeChangeVideo())
* An LGL_Video sets up its FrameRGB via avpicture_fill() with
PIX_FMT_BGR32_1 (pointing at BufferRGBBack). (See MaybeChangeVideo()).
* SwsConvertContext is set via sws_getContext() with PIX_FMT_BGR32_1, and
SWS_FAST_BILINEAR, even though no scaling is necessary. (See
MaybeChangeVideo())
* An LGL_Video decodes its frame to FrameRGB/BufferBackRGB via sws_scale()
(See DecodeFrameToImageBuffer()).
* LGL_Video::LockImage() is called.
* A new LGL_Image is created, passing in BGRA-format data at 4 bytes per
pixel, BufferBackRGB.
* Subsequent video updates call LGL_Image::UpdateTexture(), again with
BGRA-format data at 4 bytes per pixel, BufferBackRGB.
* LGL_Image creation sets AlphaChannel to false, and then creates a
4-byte-per-pixel SufaceSDL, and then calls LoadSurfaceToTexture().
* A GL_BGRA image is created, and then immediately populated from the
GL_BGRA SurfaceSDL.
* Subsequent calls to LGL_Image::UpdateTexture(), always with
4-byte-per-pixel data, result in glTexSubImage2D being called to upload
GL_BGRA data.

Implementing the above changes results didn't speed anything up. Further
research is required... CPU usage is 80%, regardless of whether
glTexSubImage2d() is called, suggesting that perhaps a format conversion is
NOT taking place...? Hmm... Need a paper that's more recent than 2005...

Original issue reported on code.google.com by interim....@gmail.com on 22 Jun 2009 at 9:32

GoogleCodeExporter commented 9 years ago
This probably affects all platforms, and definitely needs to be investigated 
for each.

Original comment by interim....@gmail.com on 14 Jul 2009 at 4:02

GoogleCodeExporter commented 9 years ago
This was a triumph.

I'm making a note here: Huge success!

Original comment by interim....@gmail.com on 7 Aug 2009 at 2:07