openfl / lime

A foundational Haxe framework for cross-platform development
https://lime.openfl.org/
MIT License
754 stars 368 forks source link

Neko/C++ OpenGL performance #162

Closed MattTuttle closed 10 years ago

MattTuttle commented 10 years ago

HaxePunk is playing around with the idea of using Lime and OpenGL for it's 2D rendering as well as adding some basic 3D support. The performance in HTML5 is amazing (10,000 sprites around 60fps) but Neko/C++ can't compete...

First off, shouldn't GLUniformLocation be an Int for native targets? This is causing unnecessary casting to Dynamic when calling the uniform functions.

Could the majority of the GL functions be inlined? This seems like a lot of overhead because it pushes to the Haxe stack.

Another major bottleneck is the lime.utils.Matrix3D class because of the use of Array. I rewrote it for HaxePunk and am using GL.uniformMatrix4fv instead of GL.uniformMatrix3D.

Even with the above improvements I can't match HTML5's performance but I'm getting closer.

delahee commented 10 years ago

Hi !

Sorry If I tell you thing you already know, but one never knows :)


I would say 10k sprites on a PC is very low as well...I think you should focus on this problem first. Event if c++ or neko are not as fast as HTML5 ( which I doubt a little but hey :) ) The cpu should not be a major problem and drawing 10k sprites should be trivial.

0- Use awesome profiling tools android GL tracers or gDebugguer will allow you to spy draw calls, eliminates redundant value etc

1- Reduce draw calls, you can eliminate every redundant call, even for a given uniform value or sampler, if value is same twice, you can remove it.

2- Sort sprites so that they can render all the same one in the same batch

3- You can use Genome2d technique to improve this, they use a buffer of samplers and just send the texture index in the vertex streams ( attributes).

4- Send as much as you can in grouped vertex streams. Ideally, you have only one nastily big buffer and send everything to the gpu at once for each native blend modes.

5- Reduce allocation of temporary variables, pool them or make static temporary primitives

I haven't measured our perf on h3d but last time one of our programmer was outputting 15-30k particles@60fps effortlessly using h2d.SpriteBatch (which is not very optimised since it uses float64...).

Good luck in your quest !

MattTuttle commented 10 years ago

Thanks for the input @delahee! I was aware of the issues with draw calls and know there are some optimizations to be made there. Thanks for the suggestion of using an OpenGL profiler. I found one for OSX that works well enough.

The number was kind of irrelevant because I'm testing on older hardware. I booted this up on my newer iMac and got around 40k sprites. However, the fact that javascript out performs c++ is a tad bit disconcerting to me.

I found binds are extremely expensive because they utilize Dynamics so I got around that by only calling bind if the value changes. Although apparently I still have to rebind the textures every frame update for some unknown reason.

hughsando commented 10 years ago

Yes, I think the problems are to do with the overhead of dynamic. I currently thinking about having a dual interface, where cpp code can step around the cffi overhead by calling a strongly typed version of the APi, but still allow neko and older code to work with the dynamically typed cffi. The other option is to have extern definitions for the gl functions and call them directly - but this would need some care with resources shared with the nme state, and also linking against libgl, which can be tricky -mainly on linux.

delahee commented 10 years ago

As some Borgia pope may have said, Thou shall never make dynamic calls :) Le 10 juin 2014 03:08, "Hugh Sanderson" notifications@github.com a écrit :

Yes, I think the problems are to do with the overhead of dynamic. I currently thinking about having a dual interface, where cpp code can step around the cffi overhead by calling a strongly typed version of the APi, but still allow neko and older code to work with the dynamically typed cffi. The other option is to have extern definitions for the gl functions and call them directly - but this would need some care with resources shared with the nme state, and also linking against libgl, which can be tricky -mainly on linux.

— Reply to this email directly or view it on GitHub https://github.com/openfl/lime/issues/162#issuecomment-45563448.

MattTuttle commented 10 years ago

Looks like a lot of my issues will be resolved in the upcoming Lime next.