libsdl-org / SDL

Simple Directmedia Layer
https://libsdl.org
zlib License
9.91k stars 1.83k forks source link

Basic SDL C ( DX ) program gives worse performance than C# MonoGame for Windows ( DX ) #3115

Closed SDLBugzilla closed 2 years ago

SDLBugzilla commented 3 years ago

This bug report was migrated from our old Bugzilla tracker.

Reported in version: 2.0.9 Reported for operating system, platform: Windows 10, x86

Comments on the original bug report:

On 2019-01-04 14:43:56 +0000, Valentyn wrote:

I have created a basic SDL program which calls SDL_RenderClear(), then a 50k times for loop with SDL_RenderCopy() calls and when loop ends, program calls SDL_RenderPresent(). SDL_Renderer has accelerated, vsync and render target texture flags. This is a release build of course and I get 60 FPS all the time. On my machine this creates 20-22% CPU load and 55-60% GPU load.

With MonoGame Windows I am having same FPS. Program calls SpriteBatch.Begin(), 50k for loop with SpriteBatch.Draw() calls and when loop is over SpriteBatch.End(). This gives me 5-7% CPU load and 45-50% GPU load.

I also tried just doing FPS counting without vsync and rendering nothing else. SDL C program gives 5-6k FPS with 20% CPU load and 80% GPU load, while MonoGame Windows gives 7-8k FPS with 15% CPU load and 50% GPU load.

Do you guys have any idea how this is possible?

On 2019-01-04 14:59:35 +0000, Valentyn wrote:

I went on and made tests with rendering of 300k sprites without vsync. It seems like SDL has problems with utilizing my GPU resources and CPU is becomming a bottleneck. This test gives me 20-22FPS with 29% CPU load and 70-75% GPU load.

MonoGame gives me 27-28FPS, uses 17% CPU and uses 99-100% GPU.

On 2019-01-04 18:45:42 +0000, Ryan C. Gordon wrote:

We redesigned the renderer implementation after SDL 2.0.9 shipped. It's considerably faster than 2.0.9's implementation, but I haven't benchmarked it against MonoGame.

If you would like to try the latest SDL in revision control, I would be interested to hear your results!

--ryan.

On 2019-01-05 01:57:37 +0000, Valentyn wrote:

Hello Ryan, I already did and there are several issues and only a little improvement there.

Issues: 1) Nothing changes when I hint for SDL_HINT_RENDER_BATCHING to be "0" or "1". It does exactly the same, it seems to always use render batching. Memory usage goes up from 30MB to 130MB and CPU load goes down from 20-22% to 14-16%. GPU load stays the same. So I see that there's some caching going on here. I actually tried rendering 1.000.000 sprites and my memory usage went to 2400MB, while CPU load stayed at 19% and GPU load at 50% and I had 4 FPS. This seems broken. 2)Removing SDL_RENDERER_PRESENTVSYNC doesn't work as it should. It still stays at 60% GPU load and gives twice less FPS than MonoGame at 50k sprites rendering. 80FPS vs 170FPS and MonoGame utilizes my GPU up to 100%.

I do link SDL statically and build from source each time and I also have a few code changes to redefine some stuff for my own memory tracking, which seems to be a bit broken after the upgrade because I can see only allocations of 180MB while there's 2400MB being allocated when rendering 1k sprites. I didn't try to load SDL dynamically in a test project but I don't think that this could be an issue, right?

On 2019-01-05 05:21:35 +0000, Ryan C. Gordon wrote:

(In reply to Valentyn from comment # 3)

I actually tried rendering 1.000.000 sprites and my memory usage went to 2400MB, while CPU load stayed at 19% and GPU load at 50% and I had 4 FPS. This seems broken.

Right now it batches everything until it has a reason to flush. I'll put an upper limit on it, so it flushes as it goes for really pathological cases like this. This should solve this problem (and possibly others).

--ryan.

On 2019-01-07 03:31:15 +0000, Alex Szpakowski wrote:

Right now in the latest source code SDL does vertex batching but not much draw call batching (e.g. every SDL_RenderCopy call causes a draw call, even if no state changed between consecutive RenderCopy calls.)

If we want to improve SDL_Render's performance further I think batching draw calls would probably give the biggest perf gains, although it might require some restructuring or tradeoffs around the implementations of per-draw transformations.

On 2019-01-08 08:21:40 +0000, Valentyn wrote:

(In reply to Alex Szpakowski from comment # 5)

Right now in the latest source code SDL does vertex batching but not much draw call batching (e.g. every SDL_RenderCopy call causes a draw call, even if no state changed between consecutive RenderCopy calls.)

If we want to improve SDL_Render's performance further I think batching draw calls would probably give the biggest perf gains, although it might require some restructuring or tradeoffs around the implementations of per-draw transformations.

I think it would be nice to have same metrics as MonoGame has because SDL offers its own rendering API just like MonoGame. Take a look here: http://www.monogame.net/docs/html/index.html This shouldn't be hard to implement and might help to perform benchmarking against MonoGame and other similiar rendering APIs.

Right now, if people are doing something serious with SDL, it comes down to writing your own rendering, which is probably best to do with https://github.com/bkaradzic/bgfx and this is lower level than SDL 2D rendering API. I'd love to use SDL 2D rendering API but it makes no sense if MonoGame performs better and it is C# too.

On 2019-01-08 08:24:28 +0000, Valentyn wrote:

Can't get a proper link to MG documentation for some reason. You can find their metrics under Class Library Reference -> Microsoft.Xna.Framework.Graphics -> GraphicsMetrics

icculus commented 2 years ago

We have since implemented the draw call batching Alex mentioned here, and we're running out of optimizations at this point. Pretty much everything that's left is a limitation of the render API, but it can comfortably render a lot of primitives now, and I'm less interested in head-to-head benchmarks and more interested in "is this limiting what your 2D game actually needs?" And I don't think it is at this point, with tens of thousands of draws per frame now being possible.