Simplification and optimization of WebGL rendering pipeline. Mainly 3 objects were the focus of the otpimizations: the flush and drawImage methods and the ContextStateStack class in WebGLContext2D.
Changes
The drawImage method used to include several state checks on the image and WebGL context, they are now gone as they seem unnecessary. If they happened to be required, they should be added at a different place where they would execute only once and not every single frame.
The flush method and associated methods in Shader are now doing program switching more efficiently.
Textures are now wrapped into TextureWrapper objects that include a reference on the original image. The goal was to avoid unoptimizing images by sticking properties onto them.
ContextStateStack has been removed as state properties of the context are now directly attached to the context, reducing the chain of property accesses.
All the checks on the existence of the gl property were removed. WebGL context is initialized when the WebGLContext2D instance is created and there is no need to check if it was properly initialized in every method accessing the WebGL API.
Performance
Overall there seems to be an obvious performance gain in terms of CPU usage.
As usual, the relative performance gain (5~20%) depends on the test, the device and measurement method.
Chrome - MacBook pro
Everwing - start menu - chrome profile
Before
After
The drawImage and flush methods are now faster, The set method has disappeared. also the % used by the program is remain roughly identical, meaning that the performance boost is reflected in the program usage.
Everwing - gameplay - chrome profile
Before
After
Here as well the drawImage, flush and set methods are optimized. % of program remain identical.
Everwing - gameplay - chrome profile compilation
The following screenshots show the average non-idle CPU time profiled by chrome across 6 gameplay runs > 10s:
Before
After
On this benchmark the gain appears consequent: (13.8 - 11) / 13.8 = 20% boost!
Everwing - gameplay - duration of engine tick method on chrome
Before
After
Here as well the performance gain is obvious: (0.66 - 0.56) / 0.66 = 15%
Chrome - iPhone 7
Everwing - start menu - safari profile
Before
After
Just as an chrome, the drawImage and set methods have been optimized. For some reason the flush method never appeared (self time too insignificant). Overall benefit is not obvious on this profile as the idle time usually is hidden under the drawElements self time.
The optimized was possibly slow to kick in? It would seem so as subsequent profiles played in the favor of the changes.
Everwing - start menu - safari profile
Before
After
On this profile the performance boost is quite obvious (optimizer possibly played in favor of the changes) if we consider the idle time as being hidden in the drawElements self time: (32.6 - 24.1) / 32.6 = 26% boost
Conclusion
Overall I think this work will provide a 10~15% cpu boost (even though some profiles seem to indicate a higher gain). Analytics will tell us but slowest devices should now run Everwing at 55fps. Meaning that we probably are 2 optimizations away from providing an average of 60fps on the slowest devices on Everwing (up from 30fps 3 months ago).
Overview
Simplification and optimization of WebGL rendering pipeline. Mainly 3 objects were the focus of the otpimizations: the
flush
anddrawImage
methods and theContextStateStack
class inWebGLContext2D
.Changes
drawImage
method used to include several state checks on the image and WebGL context, they are now gone as they seem unnecessary. If they happened to be required, they should be added at a different place where they would execute only once and not every single frame.flush
method and associated methods inShader
are now doing program switching more efficiently.TextureWrapper
objects that include a reference on the original image. The goal was to avoid unoptimizing images by sticking properties onto them.ContextStateStack
has been removed as state properties of the context are now directly attached to the context, reducing the chain of property accesses.gl
property were removed. WebGL context is initialized when theWebGLContext2D
instance is created and there is no need to check if it was properly initialized in every method accessing the WebGL API.Performance
Overall there seems to be an obvious performance gain in terms of CPU usage. As usual, the relative performance gain (5~20%) depends on the test, the device and measurement method.
Chrome - MacBook pro
Everwing - start menu - chrome profile
Before
After
The
drawImage
andflush
methods are now faster, Theset
method has disappeared. also the % used by the program is remain roughly identical, meaning that the performance boost is reflected in the program usage.Everwing - gameplay - chrome profile
Before
After
Here as well the
drawImage
,flush
andset
methods are optimized. % of program remain identical.Everwing - gameplay - chrome profile compilation
The following screenshots show the average non-idle CPU time profiled by chrome across 6 gameplay runs > 10s:
Before
After
On this benchmark the gain appears consequent: (13.8 - 11) / 13.8 = 20% boost!
Everwing - gameplay - duration of engine tick method on chrome
Before
After
Here as well the performance gain is obvious: (0.66 - 0.56) / 0.66 = 15%
Chrome - iPhone 7
Everwing - start menu - safari profile
Before
After
Just as an chrome, the
drawImage
andset
methods have been optimized. For some reason theflush
method never appeared (self time too insignificant). Overall benefit is not obvious on this profile as the idle time usually is hidden under thedrawElements
self time. The optimized was possibly slow to kick in? It would seem so as subsequent profiles played in the favor of the changes.Everwing - start menu - safari profile
Before
After
On this profile the performance boost is quite obvious (optimizer possibly played in favor of the changes) if we consider the idle time as being hidden in the
drawElements
self time: (32.6 - 24.1) / 32.6 = 26% boostConclusion
Overall I think this work will provide a 10~15% cpu boost (even though some profiles seem to indicate a higher gain). Analytics will tell us but slowest devices should now run Everwing at 55fps. Meaning that we probably are 2 optimizations away from providing an average of 60fps on the slowest devices on Everwing (up from 30fps 3 months ago).