iXit / Mesa-3D

Please use official https://gitlab.freedesktop.org/mesa/mesa/ !
https://github.com/iXit/Mesa-3D/wiki
66 stars 13 forks source link

Low frame rates on Starcraft 2 while CPU and GPU utilization is low #333

Open Venemo opened 5 years ago

Venemo commented 5 years ago

Long story short, I observed a scenario where I get low frame rates even though the CPU and GPU utilization was low. This is a continuation of this issue where we started discussing a performance problem with Starcraft 2 when running on Gallium Nine standalone.

Hardware details: the system is a Dell XPS 13 9370 with a 4-core / 8-thread Intel 8550U, a Sapphire AMD Radeon RX 570 ITX card is connected through Thunderbolt 3 (which is roughly equivalent to a x4 PCI-E connection), and a 4K Dell U2718Q screen is connected to the RX 570's DisplayPort output.

I didn't measure the frame rates in a scientific way, but the difference is very noticable:

Some observations from the other thread that are worth to mention here:

@axeldavy

On the first campain, after the videos, I get about 140 fps on full hd, everything maxed out on my radeon rx 480. The GPU load is about 90%. My 4 cpu threads are all around 60-70%.. I used GALLIUM_HUD=GPU-load,fps,cpu0+cpu1+cpu2+cpu3,shader-clock+memory-clock+temperature tearfree_discard=true WINEDEBUG=-all

That does sound awesome, and makes me think that the problem may be with my setup, or that maybe my setup triggers some sort of corner case within Nine. After thinking about it more, I've got three ideas:

Idea 1. Maybe there's something wrong with my kernel command line? Bascially what I do is disable all the spectre / meltdown mitigations, I enable the PowerPlay features in amdgpu and blacklist the i915 module.

resume=UUID=0dc28d3d-cf9a-4a1d-b980-e7f78ad7aaee rd.luks.uuid=luks-98f2c2d3-77e1-444a-b3f2-d3396b53e16e rhgb quiet mem_sleep_default=deep pti=off spectre_v2=off l1tf=off nospec_store_bypass_disable no_stf_barrier amdgpu.ppfeaturemask=0xffffffff i915.enable_guc=0 module_blacklist=i915 3

Idea 2. Maybe somehow Nine performs more IO through PCI-E than wined3d and the Thunderbolt 3 port is the bottleneck. Is it a possibility that there are some operations within Nine that are not noticable when using a PCI-E x16 connection but become problematic when running over PCI-E x4? I don't know how to verify this theory, but maybe there is a way to check the PCI-E port usage.

Idea 3. It occours to me that I installed DirectX 9 from winetricks in this Wine prefix before I started using Nine. Is it possible that this interferes with Nine somehow?

@iiv3

do you compile your own kernel? If so, you might have some additional tracing enabled.

No, I've got 4.19 from Fedora: 4.19.13-300.fc29.x86_64

If not... Could you try to disable "iommu=off", in case you are hitting some "soft" emulation that involves page faults.

Good idea, I will try that.

If this doesn't help either, try entering the "amdgpu_mm_rreg" in perf top and see which instructions are getting most heat.

Sure, I'll take a look, though I'm afraid I'm not an expert in perf.

Sorry for the long post!

iiv3 commented 5 years ago

@marekolsak, would it be possible to implement the same changes for the r600 driver?

It would be possible, but the performance improvement wouldn't be as high, because the DMA engine is not asynchronous on radeon. For r600, I recommend changing const_uploader from DEFAULT to STREAM.

Is the limitation in hardware or in the radeon kernel driver?

marekolsak commented 5 years ago

Is the limitation in hardware or in the radeon kernel driver?

It's a limitation of the radeon kernel driver.

montvid commented 5 years ago

Can someone update this thread when the patch lands upstream? I have a r3g v1.1 pcie 3.0 x16 to m.2 x4 adapter + rx580 so I could test but I don't know how to build mesa or these out of tree patches.

Venemo commented 5 years ago

@montvid Sure thing! You'll hear about it when it gets upstream. I'm curious though, how does it perform on your setup?

montvid commented 5 years ago

Native Direct3D 9 v0.3.0.184-devel is active.

It seems SC2 is free to play so I got it from battle.net. Interesting during the first campaign the cpu (i3-7100U) and gpu (RX580 8GB) usage is always ~40%, fps is around ~60 (with esync enabled a bit more), GTT is ~100 mb, vram ~1200 mb (1900mb with all settings on ultra-extreme) buffer wait time max 20us (Assassins Creed 1 can get up to 20ms! on ultra settings so runs worse). I am using tkg 4.1 wine (lutris) a pcie 3.0 x4 adapter so it is not a thunderbolt issue.

@Venemo I was hoping for better game performance from my egpu setup but thunderbolt is even worse according to egpu.io at least i have 8gb of vram to store all the ultra textures. :P

Venemo commented 5 years ago

@montvid is that with or without the SDMA patch? if without then it is roughly in line with what I experience here.

montvid commented 5 years ago

Without the patch.

dhewg commented 5 years ago

Looks like the sdma patches landed on mesa master. I guess this can be closed then?

Venemo commented 5 years ago

Even though GPU utilization is near 100% now the game is still slower on my setup than yours. I'd prefer to keep this issue open until I manage to do a proper comparison between nine and windows on this hardware. If there is no significant difference then let's close it, otherwise we should try to find the next bottleneck.

Venemo commented 5 years ago

Here is an interesting tidbit. In kernel 5.1 amdgpu adds a way to examine the PCI Express bandwidth usage: /sys/class/drm/card1/device/pcie_bw (you can substitute card1 with your card). This gives you the number of PCIe packets received and sent by the GPU as well as the maximum size of a PCIe packet.

So I ran watch -n 1 "cat /sys/class/drm/card1/device/pcie_bw" and looked at the output. When I run SC2 on mesa 19.1 I get the following numbers (roughly):

Version Received by GPU Sent by GPU Packet size Estimated max bandwidth use
Mesa 19.0 around 2500000 around 50000 128 2.5 Gbit / sec
Mesa master between 10000 - 25000 between 10000 - 50000 128 75 Mbit / sec

EDIT: fixed mistake in calculating the Gbit/sec.

There are two interesting conclusions here:

@marekolsak What do you think about my numbers, do they make sense? Am I interpreting them correctly?

marekolsak commented 5 years ago

Yes, the numbers look correct.

marekolsak commented 5 years ago

It's not just the bandwidth that matters. It's also the latency that PCIe incurs.

Venemo commented 5 years ago

@marekolsak True. But 2.5 Gbit / sec is still less than 10% of what it should be capable of. There is definitely a problem in there somewhere.