iXit / Mesa-3D

Please use official https://gitlab.freedesktop.org/mesa/mesa/ !
https://github.com/iXit/Mesa-3D/wiki
66 stars 13 forks source link

[FarCry 3]Bad performance with Gallium-Nine #313

Closed Kzimir closed 6 years ago

Kzimir commented 6 years ago

Hi,

Like Axel Davy explained here, the performance with Gallium Nine must be close to what i have on Windows and actually, it's far, far away (like Star Wars :P )

In fact, i have the same performance with Nine than WineD3D. And Gallium-Nine is correctly enabled because in the output console, i have these lines : Native Direct3D 9 is active. For more information visit https://wiki.ixit.cz/d3d9

Performance on Windows 7 : https://reho.st/self/a958f0e2e6f0765b1b7b35603b42837a1fc03b93.jpg

Performance with WineD3D : https://reho.st/self/d8960a1052964986b5ba05552eab4ea19c314e3b.jpg

Performance with Gallium Nine : https://reho.st/self/480b7bda86c14dc76562f750ba1e52d39cd38f1d.jpg

axeldavy commented 6 years ago

Could you produce a log with NINE_DEBUG=all (needs mesa built with --enable-debug to have output) and give us the last 10000 lines or so ? (the idea is to see all the commands a frame uses and see multithreading bottlenecks). If that is not possible, give us a trace so we can replay.

Kzimir commented 6 years ago

I did both :D You can download the api trace here : http://www.mediafire.com/file/qc12j2eerdxhfmb/FC3_NINEISSUE.tar.xz

And the NINE_DEBUG=all here : http://www.mediafire.com/file/p89pd42uneswgq0/farcry3_nine_debug.txt.tar.xz

axeldavy commented 6 years ago

Thank you

I don't see what could possibly cause bad performance. The multithreading seems to do ok. Perhaps some measure of where time is spent would be useful, but I don't remember exactly how to make that work (perhaps with operf joining the pid for a few seconds when the game is launched).

I notice on internet a lot of people complain of bad perf on some cpus, for example amd cpus. Perhaps the game has a fast internal path that only triggers if some criterion are met, and wine wouldn't respect all of them.

Kzimir commented 6 years ago

So there is no solution for this issue unfortunately ? :-(

axeldavy commented 6 years ago

A performance analysis via a tool like operf will tell where time is spent. If less than 5% is spent in galllium nine, no gallium nine optimisation will help. If 10% or more, there's something to be done. If there is something not related to the game exe and to gallium nine that takes a lot of %, it's a performance problem in some wine dll. if the game takes most%, like 95% it's some game internal optimisation.

axeldavy commented 6 years ago

It may also be a game sensitive to thread scheduling, and in that case playing with linux scheduler may affect performance.

axeldavy commented 6 years ago

Your windows screenshot shows the performance on your 12 cores. Could you show the same on nine with the gallium hud (not just the average, but the details for all cores. It's something like GALLIUM_HUD=cpu0+cpu1+etc

Kzimir commented 6 years ago

On my old PC with Nvidia GPU and Intel CPU, never had this performance issue. I have a Ryzen, maybe there is something wrong with my CPU.

I launch operf -p FC3_PID and create a report with opreport and i have this result : http://www.mediafire.com/file/wxbueyn6qumxx9q/FC3_operf.tar.xz

But i don't know if it's a good report ;-)

iiv3 commented 6 years ago

For starters, It doesn't seems like the CPU is bottleneck here, at least Nine uses half the wined3d CPU to get (the same) 10fps.

Do you have vsync enabled? I see the "fast" part been ~60fps. Try disabling it. export vblank_mode=1 might help. Check with glxgears.

Try enabling all possible GALLIUM_HUD graphs, with the hope one of them might give some hint. Flushes might be indicating unwanted synchronization.


Finding performance issues is very tricky. It's even worse because we don't have good performance tools under linux.

For example, apitrace can do cpu and gpu timing for opengl, but not for d3d9. Some of the Nine developers had worked on that, but the pull request has been hanging for years.

PIXWin is program that comes with D3D SDK and it has nice capabilities, but it is windows program and running it under Wine is troublesome.

A bit primitive tool, I've used a lot is "Helix Mod". It's d3d9.dll wrapper that allows dumping and replacing shaders. The dumping is done interactive, you use numpad keys to select a shader, the selected shader is nullified (it stops working), it might make an object using that shader go black or vanish entirely. If we assume that the slowdown is caused by a single shader, using this tool might help you locate it. However this is rarely the issue.

Also, playing with the graphic engine setting sometimes might help. Setting lowest setting, then enabling stuff one by one until you have a dramatic effect. Then apitrace before and after and try to find the difference...

This reminds me... check if PhysX is disabled if the game has it.

Kzimir commented 6 years ago

I try to set all GALLIUM_HUD options, i hope it will be clear for you. https://reho.st/self/f86b8efc8ad9dce13524580ac9a316cfefaea340.jpg

Disable vsync helps little, i win some fps. Same thing if i set all options to "LOW", i have 20FPS maximum. There isn't Physx in this game :-)

axeldavy commented 6 years ago

According to the huds, there is a cpu used close to 100% on windows, but not on linux. It may be some linux threading issue (windows programs have more control on switching between threads). Perhaps try to force the game to run on 4 cpus, or use the realtime scheduler ?

iiv3 commented 6 years ago

Ryzen CPU works like NUMA, where you have 2 physical cores sharing cashes thus forming a single node and you have heavy penalty if processes move to another node. So kernel should try to avoid moving processing across nodes. There were options introduces for that... However moving the process would act like dropping caches. Cache miss would result in higher cpu usage, since the CPU would not go idle when fetching memory data. You might try export csmt_force=0 to disable Nine CSMT. You might try setting process affinity... though tools might be a bit clunky.

BTW, try using top or htop to check if some processes get at 100% core utilization. E.g. wineserver is known to get high CPU utilization when there are a lot of window messages.

perf top also might be useful for finding specific CPU bottleneck. Give it a try.


About the screenshot.

What we see here is kind of locking issue, and it is extra complicated as it might be CPU/GPU locking. At least the flushing is not excessive, I've seen wined3d causing flush after every draw (or double that).

CPU locking sometimes might be spotted when using latencytop utility. It also requires a kernel compiled with specific option to work properly.

Notice in the graphs, that GPU utilization is about 60% and even shader clock is 461MHz. So basically the core is so unused it is still working in idle mode.

We might need a core mesa/radeonsi developer to look into this issue.

axeldavy commented 6 years ago

teh hud shows the system cpu usage, not the app cpu usage, so what you see in the screenshot is already the same than htop.

Possibly the game uses as many processes as cpus to prepare draw commands. If the delay to switch to a thread when it is ready is a little slow (compared to windows where it would be fast thanks to some fine control not available on linux schedulers), it could cause what we see, thus my comment on investigating there (forcing less cpus, changing linux scheduler).

iiv3 commented 6 years ago

I ask for top/htop exactly because I want to see the per app usage, not the whole system CPU usage. They can also show per-thread usage... if you know the keys.

From the first image, he gets 22,5% CPU total utilization, with 12 cores, this makes about 260% core usage. On the last picture summing the CPU gives around 340% core utilization. It is still possible that wineserver is having 100% and causing bottleneck.

The scheduler should not matter at all, since there are enough cores to run all threads without having to do context switches. (I'm joking but...). Usually games run as many threads as the physical core, so this leaves half of the virtual cores free for linux scheduler to use as it sees fit.

It is far more likely that e.g. the game tries to be multi-threaded, so it calls D3D concurrently from 6 different threads. However Nine just has a big lock around the threaded code so it processes each thread one by one in serialized manner. I think NINE CSMT should handle this much more gracefully (since the lock would hold only for filling the queue, not actual processing). Anyway, latencytop should be able to notice locking issues.

axeldavy commented 6 years ago

I don't see what you mean by having better lock mechanism for nine csmt. When you look at the commands in the trace, the draw calls are using few different calls (basically setting sampler states, textures, shader constants and then drawing). I don't see how we could make these calls faster.

iiv3 commented 6 years ago

It's not csmt locking the issue here. I'm talking for the case when e.g. two separate threads issue different drawing commands at the same time. Nine will serialize that ... one would block until the other is finished. I said that csmt might help in that case, because it would just fill the draw command in the queue, instead of executing it in gallium. And since nine csmt uses atomics, the second thread might not even block.

axeldavy commented 6 years ago

if two threads issue commands, they must still make sure the draw commands have the same parameters (shader constants, sampler states, etc). In practice the app has to use an internal locking system to ensure a thread is the only one sending all its commands for its draw call. However the d3d9 thread safety enables other threads to lock and fill buffers at the same time to prepare their draws. In other words, making the draw commands non-blocking shouldn't affect performance the slightest, since only one thread is calling them at a time anyway.

iiv3 commented 6 years ago

Whatever. I just gave it as example why it should not be issue. No idea why make is so big of a question.

Kzimir commented 6 years ago

About the issue, what is the best test i can to do to help you ?

Le mer. 4 avr. 2018 à 11:13, iiv3 notifications@github.com a écrit :

Whatever. I just gave it as example why it should not be issue. No idea why make is so big of a question.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/iXit/Mesa-3D/issues/313#issuecomment-378534499, or mute the thread https://github.com/notifications/unsubscribe-auth/ADcGFL16K7Xw-VKsdQNK_l7i-TW-LXyFks5tlI6zgaJpZM4TFeBF .

iiv3 commented 6 years ago
  1. Run the game in a window and have htop in terminal on top of it. See if wineserver is getting 100% CPU.

  2. If it doesn't, install latencytop and try it in a similar manner. It should be running when the game is running too.

  3. Try export csmt_force=0 before starting the game. This disables nine csmt, just in case.

Kzimir commented 6 years ago

1- With htop, i have wineserver at 80%.

2- With latencytop, i have this result : https://reho.st/self/9b91127151a961710196e159aa24ef04280f4d36.jpg

3- And with export csmt_force=0, there isn't difference, i still have the same FPS ~9/10FPS.

iiv3 commented 6 years ago
  1. While it's not 100%, been 80% is really high usage. I couldn't find the original work, but there seems to be a staging patch that uses shared memory between processes to lower the messaging overhead. You might want to try it.

  2. My mistake, i wanted latencytop on the game, but from the snapshot I see that it is <=5ms Max, so this is probably not the bottleneck...

  3. That was expected, but it had to be tested.

I'll see what could be obtained through PIXWin. It's a powerful D3D debugger that comes from the 2010 D3D SDK.

If anybody else have any ideas... feel free to give them a try.

Kzimir commented 6 years ago

I found this about performances with FC3 : https://bugs.winehq.org/show_bug.cgi?id=43277

I don't know if it's only with Ryzen Cpu because when i tried with a old laptop Intel/Nvidia at the release, i never had this bad perf.

I will retry after work with my laptop.

Le jeu. 5 avr. 2018 à 01:40, iiv3 notifications@github.com a écrit :

1.

While it's not 100%, been 80% is really high usage. I couldn't find the original work, but there seems to be a staging patch that uses shared memory between processes to lower the messaging overhead. You might want to try it. 2.

My mistake, i wanted latencytop on the game, but from the snapshot I see that it is <=5ms Max, so this is probably not the bottleneck... 3.

That was expected, but it had to be tested.

I'll see what could be obtained through PIXWin. It's a powerful D3D debugger that comes from the 2010 D3D SDK.

If anybody else have any ideas... feel free to give them a try.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/iXit/Mesa-3D/issues/313#issuecomment-378778387, or mute the thread https://github.com/notifications/unsubscribe-auth/ADcGFJ8uJapKAHdSCYe8Yjyz57gZw9eHks5tlVnagaJpZM4TFeBF .

Kzimir commented 6 years ago

Hey guys,

I can confirm that issue come from AMD CPU. Don't know if it's all CPU or only Ryzen CPUs. On the DXVK discord, 1 user has a Ryzen and the same bad performance than me. 3 users have an Intel CPU and have good performance.

I updated the wine bug report but i don't know if there will be a solution. Wine's dev don't care about these problem.

jomihaka commented 6 years ago

I wonder if it's the same issue with Path of Exile, where turning on Engine Multihreading game setting lowers fps greatly. I'm on ryzen1700+rx580 and get the following fps and htop cpu usage on that game:

dxvk:
    engine multithreading:
        wineserver ~72%
        poe.exe ~160%
        fps ~38

    no engine multithreading:
        wineserver ~14%
        poe.exe ~115%
        fps ~145

nine:
    engine multithreading:
        wineserver ~64%
        poe.exe ~160%
        fps ~35

    no engine multithreading:
        wineserver ~12%
        poe.exe ~125%
        fps ~120
Kzimir commented 6 years ago

iiv3 and Axel, there is an issue with Ryzen, Wine and the game's engine. See this comment on Wine Bugzilla and following screenshot to see the difference : https://bugs.winehq.org/show_bug.cgi?id=43277#c12

siro20 commented 6 years ago

Closing as not a nine bug.