iXit / wine-nine-standalone

Build Gallium Nine support on top of an existing WINE installation
GNU Lesser General Public License v2.1
272 stars 23 forks source link

Halo CE fps drops when particles are enabled, only occurs on wine-nine #85

Closed ghost closed 3 years ago

ghost commented 3 years ago

The frame rate drops severely in Halo CE (2001) when I enable "Particles" in the video settings menu when using wine-nine. Particles are generated when there is an in-game explosion, and when an explosion occurs there is a sudden drop in frame rate. There is no drop in frame rate when an explosion occurs if I disable the Particles setting, a slight drop if I set Particles to Low, and a large drop if I set it to High.

This issue only occurs in wine-nine, regular wine does not have a severe frame rate drop when there are particles. It occurs regardless of other visual settings or screen resolution. There is no throttling taking place when this occurs (CPU and GPU power management disabled).

The game runs much faster in wine-nine than under regular wine, especially on large custom maps. I normally play the game with fps capped at 60 using Chimera, and can reach 20fps or even lower when many explosions are present. Otherwise, the game runs at a smooth 60 even on large custom maps (something regular wine can't do).

Unfortunately I am unable to do an apitrace for Halo CE, the trace plays back with a black screen and buffer overflow errors. The exact issue has already been reported to apitrace developers in 2016, and has not been fixed.

Log:

GALLIUM_HUD=fps,cpu0+cpu1+cpu2+cpu3,GPU-load wine haloce.exe 
008c:err:ntoskrnl:ZwLoadDriver failed to create driver L"\\Registry\\Machine\\System\\CurrentControlSet\\Services\\wineusb": c0000142
0024:err:winediag:wined3d_dll_init Setting multithreaded command stream to 0x1.
Native Direct3D 9 v0.7.0.368-release is active.
For more information visit https://github.com/iXit/wine-nine-standalone
0024:err:winediag:MIDIMAP_drvOpen No software synthesizer midi port found, Midi sound output probably won't work.
Native Direct3D 9 v0.7.0.368-release is active.
For more information visit https://github.com/iXit/wine-nine-standalone
Using profile path .
fixme:d3d9nine:DRIPresentGroup_GetMultiheadCount (0xd2f508), stub!
fixme:d3d9nine:DRIPresentGroup_GetMultiheadCount (0xd2f508), stub!
Loading font fonts\Hack-Bold.ttf...done
Loading font fonts\Interstate-Bold.ttf...done

My system runs Ubuntu 18.04, with wine 5.10 staging and the latest 0.7 wine-nine. GPU is an AMD HD6670 (Mesa 20.0.8, Radeon driver). This hardware would have no problem playing Halo CE maxed out on Windows.

I'm happy to provide additional info if needed. Thanks!

dungeon007 commented 3 years ago

If you think it is about WC, maybe try it also with AMD_DEBUG=nowc 🤣

axeldavy commented 3 years ago

Well, If I make MANAGED uploads WC aligned (that is I upload more than I should do) performance is much better. Still I think the old code with the hack of my last message gets the best performance, I'll have to do more testing to confirm.

EDIT: I did test more carefully many combinations, and I haven't found any hint that aligning the MANAGED uploads helps performance. However I did find that compared to what it was before, the path with MANAGED performs much better. The trick in my previous post however performs even better.

And the reason for that has unfortunately only to do with overhead. The reason MANAGED performs worse is because when you lock a MANAGED buffer with last uploads not yet done, you wait for the csmt thread. Disabling csmt decreases performance overall, but increases it when the smoke effect happens. On the other hand my trick performed better in the smoke case because the overhead is lower (though the performance hit when STAGING is not used is real, so there is really something going on with WC).

I'll give a go at optimizing the contention case with MANAGED, but maybe I'm missing something there, because you talk of 20 fps and I get 200, so maybe you don't have an issue with CPU overhead but maybe it hits GPU perf and I can't mesure it because my GPU is too fast.

EDIT2: Tried setting my GPU to low. csmt affects fps much less so I'm GPU limited. The performance ratio between each methods is still the same. With the MANAGED path, the smoke fps low is 3 times above what it used to be. The hackish path is 4 times.

dungeon007 commented 3 years ago

Does not matter if GPU is fast, just look at a video at first post, he he get like 140 fps while doing nothing and it goes down as worse as 26 fps. In translation to your not potato hadware, perf likely just gradually increases, that is same as 1400 fps that goes down to 260... just remove zero 🤣

axeldavy commented 3 years ago

Well, but with the patch to use the MANAGED path for SYSTEMMEM, it should be mostly resolved. However you said you had a 3 times perf decrease in some games. In addition you said you had artifacts while I have none. Could it be that you have some hacks in your tree that you didn't disable when you tried the patch ?

dungeon007 commented 3 years ago

Well, i didnt tried this patch yet... do you want me to try this SYSTEMMEM hack you mentioned?

axeldavy commented 3 years ago

It's this patch https://github.com/iXit/Mesa-3D/commit/bb550074c887725fbca07a38d89c649d5be2f8df

In addition, this might or not help at all https://github.com/iXit/Mesa-3D/commit/8e3572d0bbfccca7a33af61b8668837cdf9a63fe

dungeon007 commented 3 years ago

That one MANAGED that i tried i just had run your tree, nothing of any hacks. Behaved like something is really badly broken. I could try it again, as who knows maybe another day is better luck 🤣

axeldavy commented 3 years ago

Well then somehow your game renders differently than mine, because things display fine here. Bare game no config changes.

dungeon007 commented 3 years ago

Tried it again and the same BAD story again... halo flickers in both modes, csmt 0 does not help. Other apps does not flicker, only that nearly all apps are 20-30% slower, but there is no any drops anymore which is "good"... these that liked csmt now dont, i have to disable it to gain more aka to be 20 % slower than before with such tricks, instead of 40% perf of before somewhere, etc... Looks like, with this you just invented WINED3D over OpenGL performance for NINE, as i dont see any nine perf advantage over wined3d anymore 🤣

dungeon007 commented 3 years ago

Will try now your SYSTEMMEM hack, quite sure it wont be this BAD 🤣

axeldavy commented 3 years ago

Well the MANAGED path can be made faster for the Systemmem small uploads seen in these games. But first we have to be conformant (have no flickers). Could you help me reproduce these flickers ? Which mod/configuration hack do you use ? Because I don't have any in the base game.

axeldavy commented 3 years ago

@Joshua-Ashton Another thing for the vertex/index buffer lock tests: There might be a behaviour related to the end of frame (EndScene ? Present ?) Indeed from https://docs.microsoft.com/en-us/windows/win32/api/ddraw/nf-ddraw-idirectdrawsurface7-lock

DDLOCK_NOOVERWRITE
New for DirectX 7.0. Used only with Direct3D vertex-buffer locks. Indicates that no vertices that were referred to in a draw operation since the start of the frame (or the last lock without this flag) are modified during the lock. This can be useful when you want only to append data to the vertex buffer.

Maybe this 'start of the frame' behaviour (which basically means that it assumes all draw calls from the past frame are finished using the buffer) is expected for SYSTEMMEM by some old games. This might explain why in Halo some SYSTEMMEM buffers are only locked with NOOVERWRITE. There might also be differences between vertex and index buffers.

dungeon007 commented 3 years ago

Game was released year 2001, even R200 Radeon 8500 was launched just in august 2001, R300 only in september 2002. I mean there was no hardware at the time of release to play it, as what we now see as default. Last hardware listed in config.txt was Radeon 9600XT/9800XT and even that is with some fixes. And that is september 2003, where they said good bye to it and go to work full time at Halo 2 instead 🤣 That explains its borkeness "The Glass only renders correctly on ~2003 ATI drivers" https://github.com/iXit/wine-nine-standalone/issues/98 and i knew if i swtich to -use14 it will work... As i see in config.txt, even lower end R300 hardware was forced to use ForceShader=14 as anything before, and -use20, so default is just for high end R300 + fixes or nvidia just high end FX and majority didnt had that anyway. 🤣 For earlier nV hardware or ATi R100 it was running on -use11, lower end FX and R200 was -use14 and that is it... only recent and high end was -use20 plus fixes and that didnt last for long 🤣 Likely was developed as D3D8 game, but they switched to D3D9 as a early explorers and that is it. Anyway, DirectX 9 was released in december 2002 🤣

dungeon007 commented 3 years ago

And of course it suffers from slowdowns like Blood Rayne D3D8 game on wrapper 🤣 I know other D3D8 apps that are slow again only with Nine and DXVK, but differently... for example these at: http://codeminion.com/games Phantasmat, Brunhilda, Saqqarah... they are all D3D8, but differently so slow. Now on PIPE_MAP_WRITE and again only slow on Nine and DXVK 🤣

dungeon007 commented 3 years ago

Sure might be playable on high end hardware, but no they shouldnt be this slow 🤣

dungeon007 commented 3 years ago

"Could you help me reproduce these flickers ? Which mod/configuration hack do you use ? Because I don't have any in the base game." Not sure how i could help you, as i got them with your tree as soon as i change weapons or throw some bombs... Maybe i should reinstall Windows? 🤣 Joking a bit, but there is always possiblity that something is wrong that came here in Debian Sid/Testing... kernel is 5.10, maybe that have somehow broken WC on my hardware or whatever... Maybe i should really reinstall OS, do not do it in years 🤣

axeldavy commented 3 years ago

Well I don't think reinstalling will help in any way.

I added two patches: https://github.com/iXit/Mesa-3D/commit/9dfa862f1458da2ff91851c087cc7c33c61ee031 This one is a hack that helps csmt perform better

https://github.com/iXit/Mesa-3D/commit/819b04b06d98eb86255d03859debe72f48048872 This one is a real optimization that gives me an increase of max fps on halo (cpu limited) (+20% with csmt but also without csmt, I'm not sure why) EDIT: the perf gain has nothing to do with better parallelization. I get the boost if I flush differently in the frame presentation path, but I don't get why. I'm investigating.

axeldavy commented 3 years ago

@dungeon007 Alright you should be happy, I have patches on the Ixit Mesa-3D tree which help halo a lot here. Could you check if performance is better in the other games as well ?

Could you also try if you have any impact replacing in device9.c NineDevice9_DrawPrimitiveUP

u_upload_data(This->vertex_uploader,
                  0,
                  (prim_count_to_vertex_count(PrimitiveType, PrimitiveCount)) * VertexStreamZeroStride,
                  1,
                  pVertexStreamZeroData,
                  &buffer_offset,
                  &resource);

by

u_upload_data(This->vertex_uploader,
                  0,
                  (prim_count_to_vertex_count(PrimitiveType, PrimitiveCount)) * VertexStreamZeroStride,
                  64,
                  pVertexStreamZeroData,
                  &buffer_offset,
                  &resource);

Basically with the former you avoid some state changes, while for the later you are more WC friendly. I'd like to know which is best.

dungeon007 commented 3 years ago

I bricked my OS install yesterday, together with all games and everything there. Downloaded Debian installer, put an usb stick to write it there, device names somehow get changed sda, sdb, sdc... and as a root just copied it to the "right" place. Would tought this will never happen to me, but yeah you can kill years old installs in a matter of seconds. So wont be able to do proper testing as of now, sorry.

axeldavy commented 3 years ago

That is sad to hear. Best luck with your reinstallation.

axeldavy commented 3 years ago

Meanwhile, I made DYNAMIC SYSTEMMEM a first class citizen with optimized uploads, using an optimized MANAGED buffer path.

Performance wise, it's possible that we could get better performance not using the MANAGED path, that is using a buffer in GTT (not WC) and fix the locking flags of the app. However it wouldn't work reliably on all games. The biggest drawback with the MANAGED path is when the app is calling DrawIndexedPrimitive when the vertex buffer is in the SYSTEMMEM pool, because we have to make sure the whole buffer is uploaded in that case as we do not know which region of the vertex buffer is used (I guess we could go through the index buffer to get that info but... probably too heavy).

@dungeon007 I hope when you can tests again that this version doesn't have the issues you found, and that performance will be great !

dungeon007 commented 3 years ago

Will test in a next couple hours on a potato Kabini, (it is similar in perf percentage ups/downs like Bonaire card anyway). What i obeserved before - it was slower everywhere and some glitches at least in Halo, not in other apps... will test other apps too now a bit more. Only what i spotted that it is a way smoother like that... i mean you are killing perf to make it smooth on unlocked frame rates it seems... or whatever is an idea behind that 🤣 Will test in couple hours again, on this new OS install anyway... all clean of hacks ixit master in comparison to mesa master 🤣

dungeon007 commented 3 years ago

But no, i have to hack GTT for this machine 🤣... i remember in Prince of Persia Sands of Time menu (and couple else games), perf was going down to 5 fps as GTT get too much owerflown and eventually just keep crashing game. I mean it is mission impossible to use NINE here and there on such APU machines as GTT just go crazy. Was talking about that two years back too, but that issue wasnt sloved... so i must hack it 🤣

axeldavy commented 3 years ago

what do you mean by hacking GTT ?

dungeon007 commented 3 years ago

I mean, dimminishing buffer uploads, that is just too much and it keep overflowing GTT, games eventually go OOM or eventally crash. 🤣 In device.c i was zeroing these 4s to behave normal: This->buffer_upload = nine_upload_create(This->pipe_secondary, 4 1024 1024, 4);

Joshua-Ashton commented 3 years ago

You should raise the GTT limit in the kernel -- the limit is set at a max of 3GiB and given you're on an APU, that's well... stupid.

dungeon007 commented 3 years ago

It isnt about these limits, this is simple game and there i am geting in Prince of Persia Sands of Time menu like 145MB of VRAM-usage and about 30MB of GTT-usage and 60 fps synced. Without a hack i am getting similar VRAM-usage and with current mesa-master GTT goes to 700MB of usage and i am getting 6 fps 🤣

dungeon007 commented 3 years ago

And it eventually just crashes a game, sometimes not, but 6 fps inst normal anyway 🤣

axeldavy commented 3 years ago

You can set This->buffer_upload = NULL; to disable that optimization. But well... It should be about 16MB of GTT usage.

dungeon007 commented 3 years ago

If you have PoP:SoF, just go to options>controls that is where this happen... 🤣 Oblivion was crashing in the menu too AFAIR, if you play too much with settings or something, all because of this.

axeldavy commented 3 years ago

You can use GALLIUM_HUD=requested-GTT,requested-VRAM,mapped-GTT,mapped-VRAM to see what is going on with and without the This->buffer_upload . If the difference is more than 16MB then there is something going on.

dungeon007 commented 3 years ago

I mean in Oblivion this does not shows up in hud, but i cant reproduce crashes anymore after patching. There was some small HOG game greatly affected by this, couldnt remember what that was.

dungeon007 commented 3 years ago

@Joshua-Ashton BTW, DXVK or WINED3D isnt affect by this GTT prob, just NINE.

axeldavy commented 3 years ago

When I launch some games and traces I have here, I consume almost no GTT here. If for you GTT explodes, there is definitely something for me to look at. Maybe some configs have issues with something Nine does.

dungeon007 commented 3 years ago

I was talking about that too two years ago on irc and as far as i see that isnt fixed yet. 🤣

dungeon007 commented 3 years ago

Yeah, that was happening in Tales of Lagoona 2 and 3, simple HOG games and i was getting OOM on these with so much crazy GTT usage 🤣

axeldavy commented 3 years ago

When upload_buffer is not NULL, is the GTT usage growing little per little, or all at once ?

dungeon007 commented 3 years ago

All ot once and by a lot. Game is starting with about 300-400MB GTT usage and with current mesa master if i go to options>controls i am just getting this 🤣 image

dungeon007 commented 3 years ago

I mean it crashes with beautuful message 🤣 https://i.postimg.cc/pX26WMwJ/pop.png

axeldavy commented 3 years ago

Well it looks like when writing the buffer_upload path I thought it was ok to... keep allocating the buffer resource even when using the buffer_upload path. So you get twice the GTT usage from vertex/index buffers.

Still maybe the crash is not because you run out of GTT memory. How much RAM do you have ? For a same scene that works both with and without buffer_upload, could you tell me what is the GTT usage ? I'd like to check it is less than x2

EDIT: Also 'GTT usage' also includes the other apps. You need to look at 'requested GTT': GALLIUM_HUD=requested-GTT,requested-VRAM,mapped-GTT,mapped-VRAM

dungeon007 commented 3 years ago

You probably remember different app - Air Strike 3D 2, that was via d3d8to9. It was leaking index buffer on GTT constantly little by little just by sitting in menu, seems unrelated to this, but maybe somehow it is. Anyway, i was patching d3d8to9 to fix that... that reminds me, i have to build all these d3d8to9 variants again on Windows, as i lost all of that couple days ago 🤣

axeldavy commented 3 years ago

I need the most accurate info to solve that issue. You could for example show screenshots of launches of the game in both configs with the hud (GALLIUM_HUD=requested-GTT,requested-VRAM,mapped-GTT,mapped-VRAM)

dungeon007 commented 3 years ago

Will do images, just couldnt even make it to run there now, it is always crashing right there in options/controls. 🤣

dungeon007 commented 3 years ago

Somehow was lucky once to take picture before crash: https://i.postimg.cc/gkfFY2WL/non-patched.png and patched: https://i.postimg.cc/J0vg4P9C/patched.png So there you go, 26MB vs 740MB on GTT 🤣... mesa master BTW.

axeldavy commented 3 years ago

Whao WTF...

Something is going wrong with these vertex buffers. The only particularity I can think of for the buffer_upload path is that it allocates persistent coherent buffers. But that is something AMD is supposed to handle very well.

axeldavy commented 3 years ago

Maybe there is a minimal resource size for persistent coherent buffers, which would be much bigger for APUs, or something like that. I'll ask on irc.

axeldavy commented 3 years ago

Looks like I can reproduce with a trace of Unigine Valley your GTT usage spike when buffer_upload is used. Curious to see what is going to come out of it.

axeldavy commented 3 years ago

It looks like the app is doing small DISCARD + NOOVERWRITE locks in a round fashion (instead of just NOOVERWRITE). This it allocates/deallocates a LOT of buffers. Somehow when these buffers are persistent coherent, they do not really get released (I checked the refcount goes really to zero, I tried waiting the GPU commands to finish, etc.) Possibly a radeonsi bug. I will investigate more.

dungeon007 commented 3 years ago

I think this Lagoona Bundle is best example of bugs mentioned in this thread: https://store.steampowered.com/bundle/18216/Lagoona_Bundle/ System Requirements Minimum: OS: Windows 7 Processor: 1Ghz Memory: 750 MB RAM Cos, not a single one will run fine really without hacks!
Second and third are d3d9 and could go OOM on GTT. And a first one is d3d8, that one will hit some slowness on PIPE_MAP_WRITE, there DXVK affected too. Yeah, simple casual games and someone would think they should run fine everywhere, but no, not here 🤣

dungeon007 commented 3 years ago

BTW, quck test of your branch and Halo works smooth without visual issues now, but all else apps go slower: Trine EE 5% slower Torchlight 10% slower FlatOut 10% slower WarCraft3:TFT 18% slower Dungeon Siege 2 30% slower ... Seems always slower and it is not only that, but on d3d8to9 visual issues appear on Need for speed hot pursuit 2, Mafia 1.0, etc... Is Halo smooth? Yes. Does that comes without slowing/breaking something else? No. Anyway, that is about it for today, gb...