veandco / go-sdl2

SDL2 binding for Go
https://godoc.org/github.com/veandco/go-sdl2
BSD 3-Clause "New" or "Revised" License
2.17k stars 219 forks source link

windows build very slow compared to linux when using gfx #487

Closed fboerman closed 3 years ago

fboerman commented 3 years ago

Hi,

I have build a small simulation with SDL2 frontend. When running the exact same build on both linux and windows there is a huge performance gap. I am using the gfx library. Below I have copied the prints of my simple timing of the render function for a linux and a windows build with the exact same code. There is a bufferflip for the logic which is included but timed seperately. its very small comparable to the rest of the render so can be ignored. The windows build is on average roughly 8 times slower then the linux build. Both are statically linked and build through github actions CI. Source code and binaries can be found on the repo here: https://github.com/fboerman/microworlds/releases/tag/V0.1.3 with build output for this release here: https://github.com/fboerman/microworlds/actions/runs/904108187 Does anybody see what could cause this?

linux:

Please input float [0-1] for p% tree density
0.5
Please input integer for number of starting fires
1
Generate forest with 50 % trees
cells: 20736, Width:192, Heigth: 108
Ignite 1 fires
Tick: 0
Took 51.562808ms, of which 65.257µs was buffer flip
Tick: 1
Took 63.186217ms, of which 76.614µs was buffer flip
Tick: 2
Took 63.611091ms, of which 108.861µs was buffer flip
Tick: 3
Took 63.455926ms, of which 76.883µs was buffer flip
Tick: 4
Took 67.419086ms, of which 75.834µs was buffer flip
Tick: 5
Took 63.028987ms, of which 77.248µs was buffer flip
Tick: 6
Took 66.807107ms, of which 83.198µs was buffer flip
Tick: 7
Took 64.323075ms, of which 75.689µs was buffer flip
Tick: 8
Took 66.818739ms, of which 81.446µs was buffer flip
Tick: 9
Took 63.620243ms, of which 80.365µs was buffer flip
Tick: 10
Took 66.650128ms, of which 78.56µs was buffer flip
Tick: 11
Took 66.868549ms, of which 73.444µs was buffer flip
Tick: 12
Took 63.493421ms, of which 76.692µs was buffer flip
Tick: 13
Took 66.554736ms, of which 78.806µs was buffer flip
Tick: 14
Took 66.629486ms, of which 86.044µs was buffer flip
Tick: 15
Took 68.081964ms, of which 74.28µs was buffer flip
Tick: 16
Took 52.733636ms, of which 74.727µs was buffer flip
Tick: 17
Took 62.903484ms, of which 78.322µs was buffer flip
Tick: 18
Took 63.420244ms, of which 73.342µs was buffer flip
Tick: 19
Took 65.883906ms, of which 106.603µs was buffer flip
Tick: 20
Took 62.355047ms, of which 89.198µs was buffer flip
Tick: 21
Took 70.741271ms, of which 75.849µs was buffer flip
Tick: 22
Took 62.099656ms, of which 73.818µs was buffer flip
Tick: 23
Took 65.736727ms, of which 78.938µs was buffer flip
Tick: 24
Took 61.615439ms, of which 72.85µs was buffer flip
Tick: 25
Took 66.155166ms, of which 78.266µs was buffer flip

on windows:

Please input float [0-1] for p% tree density
0.5
Please input integer for number of starting fires
1
Generate forest with 50 % trees
cells: 20736, Width:192, Heigth: 108
Ignite 1 fires
Tick: 0
Took 368.2303ms, of which 0s was buffer flip
Tick: 1
Took 389.8523ms, of which 0s was buffer flip
Tick: 2
Took 392.037ms, of which 0s was buffer flip
Tick: 3
Took 388.3725ms, of which 0s was buffer flip
Tick: 4
Took 387.8863ms, of which 576.1µs was buffer flip
Tick: 5
Took 393.6849ms, of which 0s was buffer flip
Tick: 6
Took 390.9148ms, of which 564.1µs was buffer flip
Tick: 7
Took 387.1256ms, of which 518.4µs was buffer flip
Tick: 8
Took 366.9348ms, of which 151.7µs was buffer flip
Tick: 9
Took 401.0901ms, of which 0s was buffer flip
Tick: 10
Took 436.1942ms, of which 27.3µs was buffer flip
Tick: 11
Took 422.9971ms, of which 0s was buffer flip
Tick: 12
Took 380.751ms, of which 0s was buffer flip
Tick: 13
Took 407.6306ms, of which 55.5µs was buffer flip
Tick: 14
Took 418.4987ms, of which 0s was buffer flip
Tick: 15
Took 423.3677ms, of which 0s was buffer flip
Tick: 16
Took 378.311ms, of which 0s was buffer flip
Tick: 17
Took 400.1359ms, of which 0s was buffer flip
Tick: 18
Took 388.8115ms, of which 74.3µs was buffer flip
Tick: 19
Took 443.9475ms, of which 0s was buffer flip
Tick: 20
Took 369.7274ms, of which 0s was buffer flip
Tick: 21
Took 391.7111ms, of which 0s was buffer flip
Tick: 22
Took 393.0362ms, of which 0s was buffer flip
Tick: 23
Took 404.5377ms, of which 0s was buffer flip
Tick: 24
Took 377.3122ms, of which 123µs was buffer flip
Tick: 25
Took 402.471ms, of which 0s was buffer flip
veeableful commented 3 years ago

Hi @fboerman, could you try profiling the program? You can see the article here: https://blog.golang.org/pprof

fboerman commented 3 years ago

hi @veeableful yes sure, thanks for the link. See below the zipfiles with the results. I am relatively new to this but it seems that the draw polygon function from gfx is very slow?

profilers.zip

veeableful commented 3 years ago

Hi @fboerman, it seems like the Windows version spends a lot of time in the C space. Could you try building the program without the -tags static -ldflags "-s -w" flag on Windows and test that version? Or if you know where the libraries installed by MSYS2 are, set CGO_LDFLAGS environment variable to have -L flag that point to the directory containing libraries installed by MSYS2.

fboerman commented 3 years ago

@veeableful hmm I tried to do this but apparently when I closed our discussion #486 I apparently only checked the static compilation. When I try the non static I get below error even if I add the flags you mentioned

In file included from ../../pkg/mod/github.com/veandco/go-sdl2@v0.4.7/gfx/sdl_gfx.go:5:
./sdl_gfx_wrapper.h:2:18: fatal error: SDL2/SDL2_framerate.h: No such file or directory
    2 |         #include <SDL2/SDL2_framerate.h>
      |                  ^~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

bit annoying that I cant get this fixed so im gonna setup a testing branch with github actions again, because that one worked. then I can test it with a non static version

fboerman commented 3 years ago

okay fixed it, see below new profiler file from a non static windows build (build using github actions)

profiler.zip

veeableful commented 3 years ago

Thanks. I was wondering if our included static library could be the issue but it seems like the system-provided library performs the same.

Are you testing them on different machines and no virtual machines? If yes, have you installed the graphics driver for Windows?

fboerman commented 3 years ago

hi @veeableful yes I tested it on a native windows machine with no virtualization whatsoever. and the github actions builds it on windows machines so its not even crosscompiling.

veeableful commented 3 years ago

Hi @fboerman, is the machine the same one that runs Linux where your Linux version of the program runs? What GPU are you using and have you installed the latest graphics driver for it on the Windows environment?

fboerman commented 3 years ago

hi @veeableful that was a good suggestion. i actually tried it on my surface pro 7 which is intel uhd graphics. on my desktop with an rtx 2060 its much faster! I dual boot my desktop so the linux one was on that gpu as well.

still for the first couple of ticks (when the screen is the fullest) it is still a factor three slower then the linux version. I have attached new profilings and below the text output per tick. profile.zip

and for the surface pro too the difference is so large that I would say that something goes wrong. Even there it should be a bit more performant I would say? Perhaps the gfx library (which is very old from what I see) is not that efficient. I was thinking about writing a function myself which simply puts it in a texture array which is then rendered. What do you think?

Please input float [0-1] for p% tree density
0.5
Please input integer for number of starting fires
1
Generate forest with 50 % trees
cells: 20736, Width:192, Heigth: 108
Ignite 1 fires
Tick: 0
Took 151.8195ms, of which 0s was buffer flip
Tick: 1
Took 152.083ms, of which 0s was buffer flip
Tick: 2
Took 153.1597ms, of which 0s was buffer flip
Tick: 3
Took 152.3505ms, of which 0s was buffer flip
Tick: 4
Took 152.5624ms, of which 0s was buffer flip
Tick: 5
Took 152.6177ms, of which 0s was buffer flip
Tick: 6
Took 152.1405ms, of which 0s was buffer flip
Tick: 7
Took 152.1327ms, of which 0s was buffer flip
Tick: 8
Took 53.6405ms, of which 0s was buffer flip
Tick: 9
Took 52.7954ms, of which 0s was buffer flip
Tick: 10
Took 52.7511ms, of which 0s was buffer flip
Tick: 11
Took 52.2879ms, of which 0s was buffer flip
Tick: 12
Took 51.7736ms, of which 0s was buffer flip
Tick: 13
Took 52.6419ms, of which 0s was buffer flip
Tick: 14
Took 53.2357ms, of which 0s was buffer flip
Tick: 15
Took 52.8318ms, of which 0s was buffer flip
Tick: 16
Took 52.2795ms, of which 0s was buffer flip
Tick: 17
Took 51.9753ms, of which 0s was buffer flip
Tick: 18
Took 53.7783ms, of which 0s was buffer flip
Tick: 19
Took 52.584ms, of which 0s was buffer flip
Tick: 20
Took 51.3215ms, of which 0s was buffer flip
Tick: 21
Took 51.2377ms, of which 0s was buffer flip
Tick: 22
Took 51.7753ms, of which 0s was buffer flip
Tick: 23
Took 51.2768ms, of which 0s was buffer flip
Tick: 24
Took 51.7645ms, of which 0s was buffer flip
Tick: 25
Took 52.3704ms, of which 0s was buffer flip
veeableful commented 3 years ago

Yeah, that approach might be useful if can reuse the textures. You could also try drawing using pixels and then scale the texture up to match window size which would be much cheaper than drawing polygons I think.

fboerman commented 3 years ago

yes but I want to transition to hexagons and then I have to use polygons.

but I think the gfx library is not that efficient at least in go. so im probably gonna rewrite to writing pixels in a texture buffer and then render the texture to the screen.

fboerman commented 3 years ago

hi @veeableful so yes that approach is MUCH faster. So I think we can conclude that the gfx library is rather slow, at least in this configuration. The rewrite you can see here: https://github.com/fboerman/microworlds/commit/b7011c4e7c514b7d9e07a3be4137e36c570244a9

dropping the gfx library also solves my crosscompiling issues.

I will close this issue now!