Open Venemo opened 5 years ago
@marekolsak, would it be possible to implement the same changes for the r600 driver?
It would be possible, but the performance improvement wouldn't be as high, because the DMA engine is not asynchronous on radeon. For r600, I recommend changing const_uploader from DEFAULT to STREAM.
Is the limitation in hardware or in the radeon kernel driver?
Is the limitation in hardware or in the radeon kernel driver?
It's a limitation of the radeon kernel driver.
Can someone update this thread when the patch lands upstream? I have a r3g v1.1 pcie 3.0 x16 to m.2 x4 adapter + rx580 so I could test but I don't know how to build mesa or these out of tree patches.
@montvid Sure thing! You'll hear about it when it gets upstream. I'm curious though, how does it perform on your setup?
Native Direct3D 9 v0.3.0.184-devel is active.
It seems SC2 is free to play so I got it from battle.net. Interesting during the first campaign the cpu (i3-7100U) and gpu (RX580 8GB) usage is always ~40%, fps is around ~60 (with esync enabled a bit more), GTT is ~100 mb, vram ~1200 mb (1900mb with all settings on ultra-extreme) buffer wait time max 20us (Assassins Creed 1 can get up to 20ms! on ultra settings so runs worse). I am using tkg 4.1 wine (lutris) a pcie 3.0 x4 adapter so it is not a thunderbolt issue.
@Venemo I was hoping for better game performance from my egpu setup but thunderbolt is even worse according to egpu.io at least i have 8gb of vram to store all the ultra textures. :P
@montvid is that with or without the SDMA patch? if without then it is roughly in line with what I experience here.
Without the patch.
Looks like the sdma patches landed on mesa master. I guess this can be closed then?
Even though GPU utilization is near 100% now the game is still slower on my setup than yours. I'd prefer to keep this issue open until I manage to do a proper comparison between nine and windows on this hardware. If there is no significant difference then let's close it, otherwise we should try to find the next bottleneck.
Here is an interesting tidbit. In kernel 5.1 amdgpu adds a way to examine the PCI Express bandwidth usage: /sys/class/drm/card1/device/pcie_bw
(you can substitute card1
with your card). This gives you the number of PCIe packets received and sent by the GPU as well as the maximum size of a PCIe packet.
So I ran watch -n 1 "cat /sys/class/drm/card1/device/pcie_bw"
and looked at the output. When I run SC2 on mesa 19.1 I get the following numbers (roughly):
Version | Received by GPU | Sent by GPU | Packet size | Estimated max bandwidth use |
---|---|---|---|---|
Mesa 19.0 | around 2500000 | around 50000 | 128 | 2.5 Gbit / sec |
Mesa master | between 10000 - 25000 | between 10000 - 50000 | 128 | 75 Mbit / sec |
EDIT: fixed mistake in calculating the Gbit/sec.
There are two interesting conclusions here:
@marekolsak What do you think about my numbers, do they make sense? Am I interpreting them correctly?
Yes, the numbers look correct.
It's not just the bandwidth that matters. It's also the latency that PCIe incurs.
@marekolsak True. But 2.5 Gbit / sec is still less than 10% of what it should be capable of. There is definitely a problem in there somewhere.
Long story short, I observed a scenario where I get low frame rates even though the CPU and GPU utilization was low. This is a continuation of this issue where we started discussing a performance problem with Starcraft 2 when running on Gallium Nine standalone.
Hardware details: the system is a Dell XPS 13 9370 with a 4-core / 8-thread Intel 8550U, a Sapphire AMD Radeon RX 570 ITX card is connected through Thunderbolt 3 (which is roughly equivalent to a x4 PCI-E connection), and a 4K Dell U2718Q screen is connected to the RX 570's DisplayPort output.
I didn't measure the frame rates in a scientific way, but the difference is very noticable:
Some observations from the other thread that are worth to mention here:
GALLIUM_HUD
CPU utilization is around 30% and GPU utilization is around 50%perf top
the majority of time is spent in functions likesi_set_constant_buffer
amdgpu_mm_rreg
andNineDevice9_SetIndices
from which I got the impression that some buffers are copied to/from the GPU and that may be a problem.@axeldavy
That does sound awesome, and makes me think that the problem may be with my setup, or that maybe my setup triggers some sort of corner case within Nine. After thinking about it more, I've got three ideas:
Idea 1. Maybe there's something wrong with my kernel command line? Bascially what I do is disable all the spectre / meltdown mitigations, I enable the PowerPlay features in amdgpu and blacklist the
i915
module.Idea 2. Maybe somehow Nine performs more IO through PCI-E than wined3d and the Thunderbolt 3 port is the bottleneck. Is it a possibility that there are some operations within Nine that are not noticable when using a PCI-E x16 connection but become problematic when running over PCI-E x4? I don't know how to verify this theory, but maybe there is a way to check the PCI-E port usage.
Idea 3. It occours to me that I installed DirectX 9 from
winetricks
in this Wine prefix before I started using Nine. Is it possible that this interferes with Nine somehow?@iiv3
No, I've got 4.19 from Fedora:
4.19.13-300.fc29.x86_64
Good idea, I will try that.
Sure, I'll take a look, though I'm afraid I'm not an expert in
perf
.Sorry for the long post!