janke99 / chromiumembedded

Automatically exported from code.google.com/p/chromiumembedded
0 stars 0 forks source link

The new OSR implementaion has an (artificial) limit to the frame rate #1368

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Use OSR with CefBrowserSettings.windowless_frame_rate = 60 or cefclient with 
'--off-screen-rendering-enabled --off-screen-frame-rate=60'
2. load a website with a perpetual css animation or webGL

What is the expected output? What do you see instead?
It should output 60fps, instead it does (at most) 30

What version of the product are you using? On what operating system?
Cef branch 2062 on linux

Please provide any additional information below.
Chromium does render at 60fps (as shown by running webGL benchmarks), but 
CropScaleReadbackAndCleanMailbox in render_widget_host_view_osr.cc isn't done 
in time, resulting in every other frame to get dropped, so OSR uses exacly half 
of the chromium frames.
When adding another browser that renders at the same time, the frame rate goes 
down to 15 fps each, with 3 browsers it's 10 each.
Note that this has nothing to do with the 
CefBrowserSettings.windowless_frame_rate limitation and is not a performance 
problem, it happens with tiny browser windows all the same.

here is what I think is the problem:
Chromium runs with vsync per default which means the GL thread gets blocked 
when a buffer swap is requested. When Cef queues the download of the frame 
buffer contents into system memory, the GL thread is already blocked, so the 
download will be done after the GL thread is finished waiting for vsync and cef 
recieves the copy finished callback too late for the next frame to be used.

Original issue reported on code.google.com by fa3rhan on 4 Sep 2014 at 9:39

GoogleCodeExporter commented 9 years ago
known issue 
https://code.google.com/p/chromiumembedded/issues/detail?id=1006

Original comment by markus.l...@gmail.com on 5 Sep 2014 at 11:03

GoogleCodeExporter commented 9 years ago
using a GL/D3D texture/surface provided by the client is a completely different 
approach and has nothing to do with the vsync problem described in this issue.

Original comment by fa3rhan on 5 Sep 2014 at 11:09

GoogleCodeExporter commented 9 years ago
It's possible to set the vsync interval on a per-compositor basis using 
|compositor_->vsync_manager()->SetAuthoritativeVSyncInterval(...))| in the 
CefRenderWidgetHostViewOSR constructor. OnSwapCompositorFrame should be called 
at or near the vsync interval. This currently results in an async call to 
DelegatedFrameHost::RequestCopyOfOutput which is rate limited based on 
off-screen-frame-rate.

It would be nice if we could eliminate the async call to RequestCopyOfOutput.
We could then just set the vsync interval based on off-screen-frame-rate.

Original comment by magreenb...@gmail.com on 5 Sep 2014 at 4:53

GoogleCodeExporter commented 9 years ago
I've started a conversation about it here: 
https://groups.google.com/a/chromium.org/forum/#!topic/graphics-dev/02jptXZMtTM

Original comment by magreenb...@gmail.com on 11 Sep 2014 at 7:47

GoogleCodeExporter commented 9 years ago

Original comment by magreenb...@gmail.com on 23 Oct 2014 at 5:27

GoogleCodeExporter commented 9 years ago
The attached patch against trunk revision 1959 adds support for the 
`--enable-begin-frame-scheduling` command-line flag which clamps the frame rate 
in all processes to the off-screen-frame-rate value. The below statistics were 
gathered on a 4 core Ubuntu 14 LTS 64-bit VM by running `ps -C cefclient -o 
%cpu,%mem,cmd` after about 2 minutes. %CPU is the CPU time used divided by the 
time the process has been running (cputime/realtime ratio), expressed as a 
percentage.

Performance without enable-begin-frame-scheduling (using vsync=30fps in the 
browser process and vsync=60fps in all subprocesses):

$ cefclient --off-screen-rendering-enabled 
--url=http://mrdoob.com/lab/javascript/requestanimationframe/

%CPU %MEM CMD
14.9  0.9 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--off-screen-rendering-enabled -
 0.0  0.3 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
 0.0  0.0 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
24.3  0.8 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=gpu-process --channel=520
 0.0  0.1 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=gpu-broker               
24.8  0.5 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=renderer --enable-deferre

Performance with enable-begin-frame-scheduling and various frame rate values:

$ cefclient --off-screen-rendering-enabled 
--url=http://mrdoob.com/lab/javascript/requestanimationframe/ 
--enable-begin-frame-scheduling --off-screen-frame-rate=X

X=60:
%CPU %MEM CMD
20.5  0.9 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--off-screen-rendering-enabled -
 0.0  0.3 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
 0.0  0.0 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
29.1  0.8 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=gpu-process --channel=510
 0.0  0.1 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=gpu-broker               
22.0  0.5 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=renderer --enable-begin-f

X=30:
%CPU %MEM CMD
14.7  0.9 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--off-screen-rendering-enabled -
 0.0  0.3 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
 0.0  0.0 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
20.8  0.8 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=gpu-process --channel=511
 0.0  0.1 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=gpu-broker               
15.6  0.5 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=renderer --enable-begin-f

X=15:
%CPU %MEM CMD
 9.7  0.9 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --off-screen-rendering-enabled -
 0.0  0.3 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
 0.0  0.0 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
13.6  0.8 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=gpu-process --channel=512
 0.0  0.1 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=gpu-broker               
10.0  0.4 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=renderer --enable-begin-f

Inspection of trace output shows that 
CefRenderWidgetHostViewOSR::SendBeginFrame, 
CefRenderWidgetHostViewOSR::OnFrameCaptureSuccess and 
CefRenderWidgetHostViewOSR::OnSwapCompositorFrame are called at the specified 
frame rate frequency while CopyOutputRequest and Compositor::Draw are called at 
2x the frequency (expected, since the output is requested an additional time 
per frame via RequestCopyOfOutput).

In this example (using requestAnimationFrame) CPU usage is highly correlated 
with the frame rate. Renderer process CPU usage with more complex content will 
show lower correlation with the frame rate.

Original comment by magreenb...@gmail.com on 16 Dec 2014 at 9:52

Attachments:

GoogleCodeExporter commented 9 years ago
i've tried the patch and can confirm the CPU/frame rate correlation.

however, visually --off-screen-frame-rate=60 and --off-screen-frame-rate=30 
still look exactly the same (as opposed to loading it in chrome where the fps 
difference is like night and day)

Original comment by fa3rhan on 18 Dec 2014 at 8:07

GoogleCodeExporter commented 9 years ago
@#7: In my testing with the current design (using CopyOutputRequest) the 
Compositor::Draw call cannot complete at much above 60fps, so OnPaint is only 
being called at about 30fps. Specifying a higher frame rate with this design 
results in higher CPU usage for no gain (because the additional frames are 
dropped).

Original comment by magreenb...@gmail.com on 19 Dec 2014 at 12:33

GoogleCodeExporter commented 9 years ago
Looking at performance now with GPU disabled. This avoids expensive readback 
from the GPU in exchange for losing some features (3D CSS is supported but not 
WebGL).

Windowed rendering performance with current trunk (no changes) with GPU 
disabled (all processes at 60fps):

$ cefclient --url=http://mrdoob.com/lab/javascript/requestanimationframe/ 
--disable-gpu --disable-gpu-compositing
%CPU %MEM CMD
10.8  0.8 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--url=http://mrdoob.com/lab/javascr
 0.0  0.3 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
 0.0  0.0 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
17.4  0.5 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=renderer --disable-gpu-compo

Performance with current trunk (no changes) with GPU disabled (browser process 
at 30fps, other processes at 60fps)

$ cefclient --url=http://mrdoob.com/lab/javascript/requestanimationframe/ 
--off-screen-rendering-enabled --disable-gpu --disable-gpu-compositing
%CPU %MEM CMD
26.2  0.9 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--url=http://mrdoob.com/lab/javascr
 0.0  0.3 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
 0.0  0.0 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
15.5  0.5 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=renderer --disable-gpu-compo

Performance with 1916 branch (old software-only OSR implementation, no 3D CSS 
or WebGL support):

$ cefclient --url=http://mrdoob.com/lab/javascript/requestanimationframe/ 
--off-screen-rendering-enabled
%CPU %MEM CMD
10.1  0.8 
/home/marshall/Downloads/cef_binary_3.1916.1931_linux64_client/Release/cefclient
 --url=http://mrdoob.co
 0.0  0.1 /home/marshall/Downloads/cef_binary_3.1916.1931_linux64_client/Release/cefclient --url=http://mrdoob.co
 0.0  0.3 /home/marshall/Downloads/cef_binary_3.1916.1931_linux64_client/Release/cefclient --type=zygote --lang=e
 0.0  0.0 /home/marshall/Downloads/cef_binary_3.1916.1931_linux64_client/Release/cefclient --type=zygote --lang=e
 6.6  0.4 /home/marshall/Downloads/cef_binary_3.1916.1931_linux64_client/Release/cefclient --type=renderer --disa

Attached is a new patch (currently tested on Windows and Linux only) that uses 
a custom SoftwareOutputDevice implementation which supports direct rendering to 
bitmap from OnSwapCompositorFrame and consequently avoids the extra 
Compositor::Draw when GPU is disabled (it's no longer necessary to call 
RequestCopyOfOutput).

Performance with new patch with GPU disabled (browser process at 30fps, other 
processes at 60fps):

$ cefclient --url=http://mrdoob.com/lab/javascript/requestanimationframe/ 
--off-screen-rendering-enabled --disable-gpu --disable-gpu-compositing
%CPU %MEM CMD
13.0  0.9 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--url=http://mrdoob.com/lab/javascr
 0.0  0.3 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
 0.0  0.0 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
17.0  0.5 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=renderer --disable-gpu-compo

Performance with new patch with GPU disabled and begin frame scheduling enabled 
with various frame rates (all processes at Xfps):

$ cefclient --off-screen-rendering-enabled 
--url=http://mrdoob.com/lab/javascript/requestanimationframe/ --disable-gpu 
--disable-gpu-compositing --enable-begin-frame-scheduling 
--off-screen-frame-rate=X

X=60
%CPU %MEM CMD
15.1  0.9 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--url=http://mrdoob.com/lab/javascr
 0.0  0.3 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
 0.0  0.0 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
17.4  0.5 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=renderer --disable-gpu-compo

X=30
%CPU %MEM CMD
 9.4  0.9 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --url=http://mrdoob.com/lab/javascr
 0.0  0.3 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
 0.0  0.0 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
11.2  0.5 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient 
--type=renderer --disable-gpu-compo

X=15
%CPU %MEM CMD
 5.5  0.9 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --url=http://mrdoob.com/lab/javascr
 0.0  0.3 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
 0.0  0.0 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=zygote --lang=en-US
 6.2  0.5 /home/marshall/code/chromium_git/chromium/src/out/Release/cefclient --type=renderer --disable-gpu-compo

Conclusions:
- The current trunk implementation uses ~150% more CPU than the 1916 branch 
implementation at 30fps.
- With this new patch CPU usage is reduced by ~51% compared to current trunk 
and goes from ~150% to ~23% CPU usage increase compared to the 1916 branch.
- With this new patch lower frame rates use less CPU than 1916 branch.
- It should be relatively easy to render directly to a client-provided surface 
instead of using an intermediary bitmap with the new SoftwareOutputDevice-based 
implementation.

Original comment by magreenb...@gmail.com on 19 Dec 2014 at 2:02

Attachments:

GoogleCodeExporter commented 9 years ago
@#9: Attached is an updated patch that's also tested to work with Retina 
displays on OS X.

Original comment by magreenb...@gmail.com on 20 Dec 2014 at 7:57

Attachments:

GoogleCodeExporter commented 9 years ago
Some OSR unit tests are failing with the custom SoftwareOutputDevice 
implementation (run `cef_unittests --gtest_filter=OSRTest.* --disable-gpu 
--disable-gpu-compositing`). Need to fix those before merging this patch.

Original comment by magreenb...@gmail.com on 20 Dec 2014 at 8:19

GoogleCodeExporter commented 9 years ago
@#11: Need to test all of the following combinations:

$ cef_unittests --gtest_filter=OSRTest.*
$ cef_unittests --gtest_filter=OSRTest.* --enable-begin-frame-scheduling
$ cef_unittests --gtest_filter=OSRTest.* --disable-gpu --disable-gpu-compositing
$ cef_unittests --gtest_filter=OSRTest.* --enable-begin-frame-scheduling 
--disable-gpu --disable-gpu-compositing

Original comment by magreenb...@gmail.com on 29 Dec 2014 at 5:24

GoogleCodeExporter commented 9 years ago
Trunk revision 1960 adds support for begin frame scheduling and direct 
rendering when GPU compositing is disabled.
- Always set the browser process VSync rate (frame rate) to 
CefSettings.windowless_frame_rate.
- When the `enable-begin-frame-scheduling` command-line flag is specified the 
VSync rate for all processes will be synchronized to 
CefSettings.windowless_frame_rate. This flag cannot be used in combination with 
windowed rendering.
- When the `disable-gpu` and `disable-gpu-compositing` command-line flags are 
specified the CefRenderHandler::OnPaint method will be called directly from the 
compositor instead of requiring an additional copy for each frame.
- CefRenderHandler::OnPopupSize now passes view coordinates instead of 
(potentially scaled) pixel coordinates.
- Add OSR unit tests for 2x (HiDPI) pixel scaling.
- Improve CefRenderHandler documentation.

Original comment by magreenb...@gmail.com on 1 Jan 2015 at 4:53