hajimehoshi / ebiten

Ebitengine - A dead simple 2D game engine for Go
https://ebitengine.org
Apache License 2.0
11.02k stars 660 forks source link

internal/graphicsdriver/directx: low performance #2188

Closed divVerent closed 2 years ago

divVerent commented 2 years ago

Easiest way to reproduce for now is with my game AAAAXY, which currently (for this reason) defaults to OpenGL rather than DirectX:

In PowerShell, run:

$Env:EBITEN_GRAPHICS_LIBRARY = "opengl"
Measure-Command { .\aaaaxy-windows-amd64.exe -load_config=false -demo_play="benchmark.dem" -demo_timedemo -vsync=false }
...
TotalMilliseconds : 23103.9543
$Env:EBITEN_GRAPHICS_LIBRARY = "directx"
Measure-Command { .\aaaaxy-windows-amd64.exe -load_config=false -demo_play="benchmark.dem" -demo_timedemo -vsync=false }
...
TotalMilliseconds : 39587.4662

(to view runtime fps, can run .\aaaaxy-windows-amd64.exe -load_config=false -vsync=false -show_fps, which shows me 110fps at the start of the game in OpenGL, and 19fps in DirectX - the lower difference in TotalMilliseconds is primarily due to loading time "equalizing" things somewhat)

Issue may be GPU specific though - I have this issue on one of these: https://www.amazon.com/2019office%E3%80%91-Ultra-Light-High-Speed-High-Performance-Notebook/dp/B09CQ22335/ref=sr_1_3?keywords=7+inch+laptop&qid=1657310835&sr=8-3 - according to Device Manager I have an Intel(R) HD Graphics 500.

-vsync=false is most certainly not at fault - with vsync on, I can't reach 60fps either, which is very noticeable.

divVerent commented 2 years ago

FYI on Linux, the same device shows 140fps at the starting point, and this benchmark takes 16.779 seconds wall clock time. glxinfo calls the GPU a "Mesa Intel(R) HD Graphics 500 (APL 2)".

divVerent commented 2 years ago

https://github.com/divVerent/aaaaxy/blob/main/nodirectx_windows.go - FYI my workaround to default to OpenGL until this is resolved.

hajimehoshi commented 2 years ago

What draw calls are executed? You can see them with -tags=ebitendebug.

hajimehoshi commented 2 years ago

I failed to execute your aaaaxy

2022/07/09 11:43:49.383029 [ERROR] cannot open out my version: could not open local:/generated/version.txt: open third_party/yd_pressure/assets/generated/version.txt: no such file or directory
goroutine 1 [running, locked to thread]:
runtime/debug.Stack()
        /usr/local/go/src/runtime/debug/stack.go:24 +0x65
runtime/debug.PrintStack()
        /usr/local/go/src/runtime/debug/stack.go:16 +0x19
github.com/divVerent/aaaaxy/internal/log.Fatalf({0x44f0a3d, 0x1d}, {0xc00013bf48, 0x1, 0x1})
        /Users/hajimehoshi/ebitengine-games/aaaaxy/internal/log/log.go:101 +0x3c
main.main()
        /Users/hajimehoshi/ebitengine-games/aaaaxy/main.go:98 +0x135
2022/07/09 11:43:49.383077 [FATAL] could not initialize game: could not initialize version: could not open local:/generated/version.txt: open third_party/yd_pressure/assets/generated/version.txt: no such file or directory
exit status 125
divVerent commented 2 years ago

This likely means you got the wrong binary - the one from GitHub Actions requires a source checkout that has performed "make generate" with the correct GOOS and GOARCH.

To reproduce, the binary here will work: https://github.com/divVerent/aaaaxy/releases/download/v1.2.141/aaaaxy-windows-amd64-v1.2.141.zip (just tested that on my Windows box).

Nevertheless, now building a "release" with ebitendebug in it so I can run that on Windows (don't have a dev environment there).

divVerent commented 2 years ago

Uploaded an ebitendebug build on https://drive.google.com/drive/folders/1QfiiH53DsoV48EKIXF3U9V9yxVaR7txb?usp=sharing - will test it on the machines when I find time to see if anything suspicious is in the render calls list.

divVerent commented 2 years ago

Typical draw call list on Linux/OpenGL (did a force-quit while the game screen was open so the blur behind the menu doesn't show up):

Update count per frame: 1
Internal image sizes:
  2: (16, 16)
  3: (16, 16)
  4: (1024, 512)
  5: (1024, 512)
  6: (2048, 1024)
  7: (1680, 1050)
  8: (2048, 2048)
  10: (1024, 512)
  11: (1024, 512)
  12: (1024, 512)
  13: (1024, 512)
  14: (128, 16)
Graphics commands:
  draw-triangles: dst: 11 <- src: [8, (nil), (nil), (nil)], dst region: (x:1, y:1, width:640, height:360), num of indices: 6, colorm: {}, mode: copy, filter: nearest, address: unsafe, even-odd: false
  draw-triangles: dst: 11 <- src: [8, (nil), (nil), (nil)], dst region: (x:1, y:1, width:640, height:360), num of indices: 1980, colorm: {}, mode: source-over, filter: nearest, address: unsafe, even-odd: false
  draw-triangles: dst: 12 <- src: [8, (nil), (nil), (nil)], dst region: (x:1, y:1, width:640, height:360), num of indices: 6, colorm: {}, mode: copy, filter: nearest, address: unsafe, even-odd: false
  draw-triangles: dst: 12 <- src: [8, (nil), (nil), (nil)], dst region: (x:1, y:1, width:640, height:360), num of indices: 1929, colorm: {}, mode: source-over, filter: nearest, address: unsafe, even-odd: false
  draw-triangles: dst: 13, shader, num of indices: 6, mode copy
  draw-triangles: dst: 12, shader, num of indices: 6, mode copy
  draw-triangles: dst: 4, shader, num of indices: 6, mode copy
  draw-triangles: dst: 13, shader, num of indices: 6, mode copy
  draw-triangles: dst: 10, shader, num of indices: 6, mode copy
  draw-triangles: dst: 5, shader, num of indices: 6, mode copy
  draw-triangles: dst: 6, shader, num of indices: 6, mode copy
  draw-triangles: dst: 7 (screen) <- src: [8, (nil), (nil), (nil)], dst region: (x:0, y:0, width:1680, height:1050), num of indices: 6, colorm: {}, mode: copy, filter: nearest, address: unsafe, even-odd: false
  draw-triangles: dst: 7 (screen) <- src: [6, (nil), (nil), (nil)], dst region: (x:0, y:0, width:1680, height:1050), num of indices: 6, colorm: {}, mode: copy, filter: nearest, address: unsafe, even-odd: false

This matches my expectations - there is screen clearing, tiles rendering, polygon rendering for visible area, blurring that polygon, mixing the two together with previous frame, blurring the output for next frame, and finally copying all that stuff to the screen with a CRT filter, after which Ebiten will blit that to the screen again (but with nearest filter, thanks to SetScreenFilterEnabled). Haven't checked yet if it looks any different when using DirectX.

divVerent commented 2 years ago

The render call list seems to be the same when using DirectX backend. I am sure I am using the backend because whenever I launch with DirectX, at early startup there is a white rectangle on the screen where my command prompt was - with OpenGL this doesn't happen.

hajimehoshi commented 2 years ago

From your result of ebitendebug, there is nothing odd.

I'd like to modify and try your aaaaxy on my local machine (macOS and/or Windows). Would it be possible to build it myself?

EDIT: I forgot to read README. Thanks,

hajimehoshi commented 2 years ago

@divVerent Could you try a32a137fa805f8dca08e499a85f6e84fb96361c8? Thanks,

hajimehoshi commented 2 years ago

Current profiling result (a32a137fa805f8dca08e499a85f6e84fb96361c8, -vsync=false, warp (software rendering) on Parallels)

image

divVerent commented 2 years ago

I will try your change - I do not think this issue is vsync=off specific, however "unnecessary flushes" is certainly a possibility.

Although I'd be surprised if this is due to ReadPixels/ReplacePixels being on a different command chain - I never do those outside precaching at the start of the game or text rendering in my menu (in-game text is precached too to avoid performance loss).

On Sat, Jul 9, 2022 at 11:42 AM Hajime Hoshi @.***> wrote:

Current profiling result (a32a137 https://github.com/hajimehoshi/ebiten/commit/a32a137fa805f8dca08e499a85f6e84fb96361c8, -vsync=false)

[image: image] https://user-images.githubusercontent.com/16950/178112745-7b33a9d1-1454-4a13-b402-ac506325bf7c.png

— Reply to this email directly, view it on GitHub https://github.com/hajimehoshi/ebiten/issues/2188#issuecomment-1179563917, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5NMDTO53S5WG3XT6PDITVTGMXFANCNFSM53CA22LA . You are receiving this because you were mentioned.Message ID: @.***>

divVerent commented 2 years ago

Note: I cannot patch https://github.com/hajimehoshi/ebiten/commit/a32a137fa805f8dca08e499a85f6e84fb96361c8 on top of Ebiten v2.3.5, but I am going to retest against Ebiten main which contains the change as well as 0035ba0bd1a35c4a27c2933af17276af7b7b7e1d.

hajimehoshi commented 2 years ago

Although I'd be surprised if this is due to ReadPixels/ReplacePixels being on a different command chain - I never do those outside precaching at the start of the game or text rendering in my menu (in-game text is precached too to avoid performance loss).

Before the fix, commands were flushed every time when DrawTriangles was called, regardless of copyCommandList usages. So, even though you don't call ReadPixels/ReplacePixels (and actually you don't call them), commands were flushed and then waiting happened unnecessarily.

hajimehoshi commented 2 years ago

Note: I cannot patch https://github.com/hajimehoshi/ebiten/commit/a32a137fa805f8dca08e499a85f6e84fb96361c8 on top of Ebiten v2.3.5, but I am going to retest against Ebiten main which contains the change as well as https://github.com/hajimehoshi/ebiten/commit/0035ba0bd1a35c4a27c2933af17276af7b7b7e1d.

Note that I don't plan to backport this change to 2.3 branch as this is just a performance improvement.

divVerent commented 2 years ago

With your changes I now get 35fps at game start (OpenGL remains at 110fps). Way better than 19fps, so the flushing fixes certainly helped, but I'd really like to get up to 60fps before I can make DirectX mode default.

At fastest render settings (in the menu: graphics=SVGA quality=Lowest) I now get 150fps with DirectX, 215fps with OpenGL.

Phasing up render settings on DirectX again:

So the big steps are from low to medium, and from high to max. Looking in source code they are (https://github.com/divVerent/aaaaxy/blob/0878d763d4bedad077d9416eaa13b2bd5e3251c3/internal/menu/settings.go#L255):

Peculiarly, though, if I move quality to max but graphics to SVGA, I also get 100fps, which is very much acceptable. So the complex dither shader is expensive, and I can have either the dither shader (https://github.com/divVerent/aaaaxy/blob/main/assets/shaders/dither.kage.tmpl) or the CRT shader active, but not both, if I want to be better than 60fps.

I wonder if reality is that all complex shaders are more expensive in DirectX than in OpenGL mode, and that there is also a hard cap on the framerate (in OpenGL, at lowest possible settings, I can reach 220fps at most, BTW, but I bet it's then simply CPU bound by my render code).

hajimehoshi commented 2 years ago

Thank you for the trial! I'll take a look further. My current suspect is how much the shader programs are optimized.

divVerent commented 2 years ago

BTW there quite certainly are things in those shaders I could maybe write better; if it helps, here are the template settings the dither shader runs with:

.BayerSize = 0
.RandomDither = 0
.PlasticDither = 1
.TwoColor = 1

The linear2xcrt shader runs with:

.CRT = 1

(this rather complex part can be turned off by passing -screen_filter=linear2x, which makes it a fancy upscaler but no longer contain the scanline and bending effect)

hajimehoshi commented 2 years ago

I simply added an optimization flag to D3DCompile (bf0f3d304bd5c92f26d9df2b5591d1f848a255f1). The same method might work for Metal.

hajimehoshi commented 2 years ago

I'll take a look further later (maybe tomorrow), but my current guess is that the HLSL code generated by Kage might not be good. Thank you for a lot of helpful information.

I'd be happy if you could take a look at bf0f3d304bd5c92f26d9df2b5591d1f848a255f1. Thanks,

hajimehoshi commented 2 years ago

With my Windows PC (Vaio LAPTOP-31PU6LDL), the FPS was about 70 in the original aaaaxy with Ebitengine v2.3.5, and about 100 with Ebitengine bf0f3d3. The FPS was 220 with OpenGL. So, the FPS should be increased but is still 2x lower than OpenGL.

hajimehoshi commented 2 years ago

I'm trying to add more optimization. Remaining tasks I can do are:

EDIT: My current guess is that the output of Kage is not matured, and the HLSL compiler's optimization doesn't work well. For example, examples/airship uses about 8 draw commands with the Ebitengine default shaders, but can keep over 400 FPS on my machine with DirectX, and 600 FPS with OpenGL.

EDIT2: I'm not 100% sure, but 4c121ae5eb13bffc6bb85e3d74fdc7b98cf5350e significantly improved the situation.

hajimehoshi commented 2 years ago

With my Windows PC (Vaio LAPTOP-31PU6LDL), the FPS was about 70 in the original aaaaxy with Ebitengine v2.3.5, and about 100 with Ebitengine https://github.com/hajimehoshi/ebiten/commit/bf0f3d304bd5c92f26d9df2b5591d1f848a255f1. The FPS was 220 with OpenGL. So, the FPS should be increased but is still 2x lower than OpenGL.

With the current latest commit b8367da7e235036e9c1a9834de50a0a604ec69d8, aaaaxy could keep 150-200 FPS!

divVerent commented 2 years ago

On my machine, with Ebiten at b8367da7e235036e9c1a9834de50a0a604ec69d8 (TODO: should verify I actually built against that and there wasn't some caching effect): game starts out at 21fps but if I let it sit there, it soon moves to 31fps and stays there.

At SVGA/Max I get 119fps.

At VGA/High I get 122fps.

This is somewhat illogical - VGA/Max settings should never take longer to render than one frame SVGA/Max and one frame VGA/High, which would yield 1/(1/119+1/122) ~ 60fps, but it's substantially slower than that. Any way how those shaders could negatively interact with each other? They're in different render passes after all.

divVerent commented 2 years ago

Confirmed I was actually including current code - the comment from b8367da7e235036e9c1a9834de50a0a604ec69d8 is in the binary I tested.

One thing I will later to (likely not before end of next week) is experiment with my shader code, comment things out, to see which parts are the expensive parts. There is a way to do this without recompiling (mainly note for myself so I know how to speed this up when I have time for it):

aaaaxy-windows-amd64 -dump_embedded_assets=data
# make edits in data/assets/shaders/*
aaaaxy-windows-amd64 -cheat_replace_embedded_assets=data -batch

(-batch turns off the error dialog at the end when cheating, useful if I want to use Measure-Command with this as above)

divVerent commented 2 years ago

As for a possible interaction between the shaders: both palette reduction (enabled when graphics is set to VGA or lower) and CRT filter (enabeld at max quality) add one render pass; the former adds a 640x360->640x360 pass, and the latter adds a 640x360->intermediate_res pass and change Ebiten's final pass from 640x360->output_res to intermediate_res->output_res (where intermediate_res is the min of 2560x1440 and output_res).

Do note that this postprocessing uses the same input as the round of the two blur render passes that remember a blurred version of previous screen contents for the fade out effect in the "fog of war" area. As there is no data dependency on that output within the same frame, it is conceivable that these two operations might run partially in parallel (not sure how smart DirectX is, but OpenGL probably is not smart enough to do that kind of optimization).

Are there any DirectX-level debugging tools that could tell me if any such interaction might exist? Like a DirectX equivalent of apitrace?

divVerent commented 2 years ago

-draw_outside=false disables the blur pass that remembers previous screen content, but keeps the two postprocessing shaders active - above 100fps with that.

divVerent commented 2 years ago

With dither.kage.tmpl neutered (all commented out, and a Fragment function added that just returns imageSrc0UnsafeAt(texCoord)), I still get ~30fps.

Same treatment also done to linear2xcrt.kage.tmpl and I get 37fps. Still nowhere near the 100fps.

(BTW: when doing this, be sure to check log messages on the console - my game code tends to skip shaders and work without them if compilation fails and it can detect that - too bad GLSL/HLSL-level compile errors happen as part of the main loop where I can't detect them, so this only really helps if Kage changes incompatibly)

So now I have ruled out the contents of the shaders (as seen above, optimization did help, but only to some extent); the slowness comes from the render passes themselves.

hajimehoshi commented 2 years ago

So the FPS is still around 30 with the default state, right?

EDIT: What about github.com/hajimehoshi/ebiten/v2/examples/airship on your mahchine?

hajimehoshi commented 2 years ago

Do note that this postprocessing uses the same input as the round of the two blur render passes that remember a blurred version of previous screen contents for the fade out effect in the "fog of war" area. As there is no data dependency on that output within the same frame, it is conceivable that these two operations might run partially in parallel (not sure how smart DirectX is, but OpenGL probably is not smart enough to do that kind of optimization). Are there any DirectX-level debugging tools that could tell me if any such interaction might exist? Like a DirectX equivalent of apitrace?

Sorry but I'm not familiar with DirectX tools. It is possible that OpenGL implicitly executes some commands in parallel, while DirectX doesn't unless they are explicitly ordered. And, Ebitengine doesn't specify parallel executions.

hajimehoshi commented 2 years ago

I'm quite confused at what kind of shaders and how they interact in your application... A figure would be helpful. Thanks,

hajimehoshi commented 2 years ago

So now I have ruled out the contents of the shaders (as seen above, optimization did help, but only to some extent); the slowness comes from the render passes themselves.

Very interesting. Perhaps, does the destination size matter?

hajimehoshi commented 2 years ago

Issue may be GPU specific though - I have this issue on one of these: https://www.amazon.com/2019office%E3%80%91-Ultra-Light-High-Speed-High-Performance-Notebook/dp/B09CQ22335/ref=sr_1_3?keywords=7+inch+laptop&qid=1657310835&sr=8-3 - according to Device Manager I have an Intel(R) HD Graphics 500.

Celeron J4125 has UHD Graphics 600 instead of UHD Graphics 500.

https://www.intel.com/content/www/us/en/products/sku/197305/intel-celeron-processor-j4125-4m-cache-up-to-2-70-ghz/specifications.html

Could you confirm that this is the machine you are testing?

divVerent commented 2 years ago

To be clear, I got a device that looks quite much the same on Ali Express and has all connectors in the same place - I assume the Amazon one is the same, but it is possible that the innards are changing without the exterior look changing.

My device has a Celeron J3455 according to /proc/cpuinfo, so yeah, it isn't quite the same.

On Mon, Jul 11, 2022 at 2:08 AM Hajime Hoshi @.***> wrote:

Issue may be GPU specific though - I have this issue on one of these: https://www.amazon.com/2019office%E3%80%91-Ultra-Light-High-Speed-High-Performance-Notebook/dp/B09CQ22335/ref=sr_1_3?keywords=7+inch+laptop&qid=1657310835&sr=8-3

  • according to Device Manager I have an Intel(R) HD Graphics 500.

Celeron J4125 has UHD Graphics 600 instead of UHD Graphics 500.

https://www.intel.com/content/www/us/en/products/sku/197305/intel-celeron-processor-j4125-4m-cache-up-to-2-70-ghz/specifications.html

Could you confirm that this is the machine you are testing?

— Reply to this email directly, view it on GitHub https://github.com/hajimehoshi/ebiten/issues/2188#issuecomment-1180003092, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5NMAUNFQEJSNTFCD65PTVTO26JANCNFSM53CA22LA . You are receiving this because you were mentioned.Message ID: @.***>

divVerent commented 2 years ago

The exact set of render calls depends on the settings, so a figure is rather hard to make. The general process at high settings is:

On Sun, Jul 10, 2022 at 11:36 PM Hajime Hoshi @.***> wrote:

I'm quite confused at what kind of shaders and how they interact in your application... A figure would be helpful. Thanks,

— Reply to this email directly, view it on GitHub https://github.com/hajimehoshi/ebiten/issues/2188#issuecomment-1179930499, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5NMAL3GKPSIIONC57B2TVTOJEZANCNFSM53CA22LA . You are receiving this because you were mentioned.Message ID: @.***>

hajimehoshi commented 2 years ago

Thanks. The current performance bottleneck is the existence of the shader

send P to CRT shader, output is C (typically at screen res, capped to 2560x1440)

and whether the shader's content is empty or not doesn't matter. Do I understand correctly?

hajimehoshi commented 2 years ago

I'm looking for a machine with the same chipset (Celeron J3455)

https://www.amazon.co.jp/dp/B0875LXTRC https://www.amazon.co.jp/dp/B096S7Y23N https://www.amazon.co.jp/dp/B0B14Z49GD https://www.amazon.co.jp/dp/B09R4FWC4D https://www.amazon.co.jp/dp/B07TXYRXW4

EDIT: I bought a used EZBook X3 Pro 64G

divVerent commented 2 years ago

I am not yet sure that this is the bottleneck. Yes, removing the pass fixed framrate, but removing the one that applies the palette (even if the shader is a NOP) fixes it too.

Which makes me think that the issue may be something else. Am I e.g. exceeding some limit in VRAM usage? Does it otherwise matter how many passes run?

But then why is the OpenGL backend not affected equally?

On Mon, Jul 11, 2022, 09:10 Hajime Hoshi @.***> wrote:

Thanks. The current performance bottleneck is the existence of the shader

send P to CRT shader, output is C (typically at screen res, capped to 2560x1440)

and whether the shader's content is empty or not doesn't matter. Do I understand correctly?

— Reply to this email directly, view it on GitHub https://github.com/hajimehoshi/ebiten/issues/2188#issuecomment-1180390409, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5NMCWY6VLL7P4FPYIDB3VTQMKPANCNFSM53CA22LA . You are receiving this because you were mentioned.Message ID: @.***>

hajimehoshi commented 2 years ago

It's possible that what the OpenGL driver does more sophisticated things than I do with DirectX. I'll take a look further after the machine I ordered arrives.

divVerent commented 2 years ago

I now tried revamping how textures are allocated to have different strategies rather than always using the same temp texture for the same purpose - but this does not change DirectX performance at all, so we now know that's not it either.

This is in my branch managed-offscreens in my game - I am unsure if I really want to merge that, but it eliminates two 640x360 textures by default. In particular this rules out "VRAM exhaustion".

hajimehoshi commented 2 years ago

I found wrong descriptor table usages (#2201) and fixed it. Could you try b3267a712681fd46bbf99519eb1233a5dd12d08f? Thanks,

I've received the machine with Intel HD 500 Graphics so I'll try tomorrow.

This is in my branch managed-offscreens in my game - I am unsure if I really want to merge that, but it eliminates two 640x360 textures by default. In particular this rules out "VRAM exhaustion".

~Did this work on your machine with high FPS?~ OK so this doesn't change the performance...

divVerent commented 2 years ago

Not seeing any differences even now - but also, peculiarly, I cannot run dxcap.exe to get a capture of DirectX usage. In capture mode (dxcap -file aaaaxy.vsglog -c aaaaxy-windows-amd64) just hangs around after Ebiten opens its window. Having said that, I've never used dxcap before, so this might be user error.

I also can no longer reproduce getting 100fps, even with the binary that I had before; I will retest later, suspecting this simply to be some background activity.

divVerent commented 2 years ago

PIX (https://devblogs.microsoft.com/pix/download/) shows a lot of warnings about 131 "redundant transition to unused state" in a single frame, as well as some redundant ResourceBarriers. Maybe that is related?

Can't do much in PIX, this laptop has a 800x480 screen and I can't reach half the UI of it.

hajimehoshi commented 2 years ago

OK so FPS doesn't change... (though I believe the fix is necessary to use GPUs correctly)

"redundant transition to unused state" might be a very good hint. PIX didn't work well on my Parallels machine. I'll try the new machine (Intel HD Graphics 500) later anyway.

hajimehoshi commented 2 years ago

I think I could reproduce your issue, but the situation might be different.

In all the cases I disabled vsync.

With OpenGL, the FPS is around 110 even with the max quality.

EDIT: Oops, I tested this with Ebitengine v2.3 accidentaly. With the latest commit, it reached 88 FPS with the max quality.

hajimehoshi commented 2 years ago

PIX (https://devblogs.microsoft.com/pix/download/) shows a lot of warnings about 131 "redundant transition to unused state" in a single frame, as well as some redundant ResourceBarriers. Maybe that is related?

I coulnd't see such warnings. How did you see them?

hajimehoshi commented 2 years ago

I realized that FPS depends on the player's position, and in some places the FPS is actually less than 30. I'll take a look further

image

divVerent commented 2 years ago

I launched PIX, selected the game binary and set the environment variable EBITEN_GRAPHICS_LIBRARY to directx there, then launched the game from there and once all stabilized, hit print screen.

I may then have had to click something in the bottom area to let it actually play back the frame, and the warnings view then showed something - including links to click to get more warnings.

As for the numbers on your system - interesting you do not get such a sharp cutoff. I assume in OpenGL mode the framerate is substantially higher for you too?

To get the test more similar, maybe try hitting F (toggle full screen) then resize the window to about 800x480 (which is all my 7" laptop does)?

On Thu, Jul 14, 2022, 03:27 Hajime Hoshi @.***> wrote:

PIX (https://devblogs.microsoft.com/pix/download/) shows a lot of warnings about 131 "redundant transition to unused state" in a single frame, as well as some redundant ResourceBarriers. Maybe that is related?

I coulnd't see such warnings. How did you see them?

— Reply to this email directly, view it on GitHub https://github.com/hajimehoshi/ebiten/issues/2188#issuecomment-1184093659, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB5NMC4MNWQA5EJMWYNMXTVT66MDANCNFSM53CA22LA . You are receiving this because you were mentioned.Message ID: @.***>

hajimehoshi commented 2 years ago

I'll try pressing print screen later, thanks.

As for the numbers on your system - interesting you do not get such a sharp cutoff. I assume in OpenGL mode the framerate is substantially higher for you too?

Yes, higher and more stable with OpenGL.

To get the test more similar, maybe try hitting F (toggle full screen) then resize the window to about 800x480 (which is all my 7" laptop does)?

I'm already using a window mode with 1280x720 size. I'll try 800x480 later but I don't think the window size matters here.

hajimehoshi commented 2 years ago

I pressed print screen and a wpix files are found. I confirmed DirectX calls and warnings: "Consecutive calls to ResourceBarrier". Yeah, I think this is it. The blur rendering uses an offscreen and this causes state switchings. I'll look for a better to use offscreens.

EDIT: ResourceBarrier takes multiple switchings. I should have batched them.

EDIT2: Hmmm batching ResourceBarrier slowed applications... 🤔 https://github.com/hajimehoshi/ebiten/tree/issue-2188-batching

EDIT3: Never mind, this didn't cause regressions (but didn't improve the performance very much). I'll merge https://github.com/hajimehoshi/ebiten/pull/2203 later anyway.