Closed ghost closed 3 years ago
Are you comparing performance relative to mesa master or to mesa master + some hacks ?
As for Unigine Valley, as the app is doing a lot of discards during its rendering loop, it makes sense for it to use a lot of memory. Maybe this is a different issue to yours.
For Unigine Valley, I get low GTT usage if I disable csmt (csmt_force=0 ) and buffer_upload. If I use csmt or buffer_upload I get high GTT usage. I think it comes down to pipeline flushes. I'm not sure the GTT usage really is lower, but probably it justs marks the buffer as released. Did you have csmt on when you managed to get low GTT without buffer_upload ?
@dungeon007 It seems when csmt is ON or when buffer_upload is not NULL, we can overconsume GTT if the app is doing a lot of DISCARDs (Unigine Valley does). I understand why (it's due to gallium resource cache), and I have a few ideas how to fix it. I'd like to know if it's the problem you hit could be this one or a different one. Could you tell me if you had csmt OFF or ON in your working examples ?
@Joshua-Ashton In our tests we had seen that DISCARD does nothing on DEFAULT buffers if they are not busy. Valley locks in a round fashion its index buffers but instead of using DISCARD then NOOVERWRITE, it uses DISCARD then DISCARD | NOOVERWRITE. We both know that DISCARD is preferred when NOOVERWRITE is set. But here as the locked region was not busy since the last DISCARD, I am wondering if that could be a case where the DISCARD is ignored. Do you know the answer ?
Potentially it could be, I am not sure though. I might have a bash at implementing dropping of discards in the non-contested case in DXVK.
@axeldavy Nope, PoP case not affected by csmt on or off. Two years back i remember when i was testing this, i dont needed to zero it, i could bumb it more and GTT usage jumps gradually more and more, as i could set up to 1.5 there instead of 4, but 2 or greater and then bug starts. Well, probably some app gave me better perf with 0, so i go with that 🤣 Just to mention this APU defaults to 512MB as VRAM, but i could set anything form 64MB up to 2GB... and going up from default didnt mattered for this bug at all.
I think it was like i could set it 0 or 0.5 or 1 or 1.5... but on 2 there we go. No idea why, might be somehow some 32bit limit came in there or whatever 🤣
@Joshua-Ashton The reason I never cared about not discarding in the non-contested case is that I have never seen any app discarding in an non-contested case. So not worth the addded overhead in my opinion.
@dungeon007 So, just to confirm, PoP is fine with csmt on and buffer_upload set to NULL ?
Sure, everything seems fine if i set it to null 🤣
@dungeon007 Well in that case I will need you do to do more tests, because I won't figure it out alone. Would you mind ?
@dungeon007 I think I know how to fix the artifacts you have seen. Another user reported them on the mesa merge request. However I have a question for you about the performance drop: Are we talking about a performance drop below 60 fps or not ? Because if it is a performance drop, but performance is excellent, I might not bother with the remaining optimization to fix this drop.
@dungeon007 If you want to help debug buffer_upload, could you come on irc's #d3d9 channel ? It will be easier to talk.
The first thing to try is that:
This->buffer_upload = nine_upload_create(This->context.pipe, 4 * 1024 * 1024, 4);
in device9.c (NOTE It means it will require csmt to be set to OFF or it might lead to corruptions) and then in nine_buffer_upload9.c:
static void
nine_upload_destroy_buffer_group(struct nine_buffer_upload *upload,
struct nine_buffer_group *group)
{
DBG("%p %p\n", upload, group);
DBG("Release: %p %p\n", group->map, group->map+upload->buffers_size);
assert(group->refcount == 0);
if (group->transfer)
pipe_transfer_unmap(upload->pipe, group->transfer);
if (group->resource)
pipe_resource_reference(&group->resource, NULL);
upload->pipe->flush(upload->pipe, NULL, 0);
group->transfer = NULL;
group->map = NULL;
}
in nine_buffer_upload.c
Then compile and test with csmt OFF.
This is
This->buffer_upload = nine_upload_create(This->context.pipe, 4 * 1024 * 1024, 4);
Not whatever I did put in the first version of my message (in case you use the mails to patch your code)
Well, fire up what is on your mind. Dont have time right now, but will look couple hours later. 🤣
Alright, I'm hoping this patch: https://github.com/iXit/Mesa-3D/commit/e8c93d57445342d612b54257aed71f208d5235be Fixes the performance issues with your games using SYSTEMMEM.
First thing, csmt off: https://i.postimg.cc/yNFhFtz7/first-thing.png gtt goes down, fine... well, still not enough to bring up perf to 60. csmt on, not leaded to corruption, just locked up machine here. 🤣 Anyway, second thing?
"Alright, I'm hoping this patch: iXit/Mesa-3D@e8c93d5 Fixes the performance issues with your games using SYSTEMMEM." So, now we are talking about several things there at the same time 🤣... to apply that systemmem fix to mesa master or as is with ixit tree ?
Ah, you probably mean to fix these flickerings/corruptions, OK that 🤣
Nope, I really meant about the performance of the systemmem work of a few days ago
First thing, csmt off: https://i.postimg.cc/yNFhFtz7/first-thing.png gtt goes down, fine... well, still not enough to bring up perf to 60.
Well, this part makes sense at least. The part that doesn't make sense is why you don't have the exact same phenomenon when csmt is ON and the path is disabled.
Nope, I really meant about the performance of the systemmem work of a few days ago
And the patch had errors. Fixed now.
Forget about testing these, It needs serious reworks.
"Alright, I'm hoping this patch:" iXit/Mesa-3D@e8c93d5 Fixes the performance issues with your games using SYSTEMMEM. "Nope, I really meant about the performance of the systemmem work of a few days ago. And the patch had errors. Fixed now." On that point, perf is going back in DS2, WC3... so, consider that as fixed.
"Forget about testing these, It needs serious reworks." He, he, it looked fine to me 🤣
Well the range of the data requested to be uploaded was incorrect. It only worked because we uploaded more than needed (the region dirtied by recent locks). If I only upload what I computed as needed for the draw call, it becomes a vertex mess.
Dirty work, anyway just to tell you that i seems dont have GTT issue with PoP with your tree, if with csmt off. That was indifferent with mesa master. Well, it might be same issue like with Unigine. 🤣
Alright, what about this branch ? https://github.com/iXit/Mesa-3D/tree/nine_dynamic_systemmem
I don't track anymore regions uploaded. I just upload the data needed for the current draw call even if it was there already. And I use discard/nooverwrite pattern if nothing bad happens for my uploads. Thus rendering shouldn't have artifacts. However performance might be a bit lower than before.
Not good still, there are still artifacts with that one too. Need for Speed Hot Pursuit 2 (d3d8to9), Indiana Jones and The Emperor's Tomb (d3d8to9), Dungeon SIege 1 (ddraw)... in native d3d9 apps didnt spotted it. Seems to happen only via these wrappers, Indiana i had to play somewhere by the end of second level to start to appear... and that vertex mess on the entire screen appeared only if a wanna take gun, that was funny... 🤣
Where those artifacts already before the patchset ? They might be unrelated to SYSTEMMEM.
I'm curious if these wrappers have issues with native too
Nope, it is not about these wrappers... Singularity gave me idea, to try some UE3 Engine D3D9 game JUJU and it is full of artifacts too. And before that game goes OOM, crashed up with default, once machine locked up. Lets pass csmt off, lets pass AMD_DEBUG=mono... i mean all of that bullshit mine GTT zero hack avoid all of these OOMs. 🤣
Were the artifacts before the patchset or they are new ?
Le mer. 10 mars 2021 à 07:49, dungeon007 notifications@github.com a écrit :
Nope, it is not about these wrappers... Singularity gave me idea, to try some UE3 Engine D3D9 game JUJU and it is full of artifacts too. And before that game goes OOM, crashed up with default, once machine locked up. Lets pass csmt off, lets pass AMD_DEBUG=mono... i mean all of that bullshit mine GTT zero hack avoid all of these OOMs. 🤣
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/iXit/wine-nine-standalone/issues/85#issuecomment-794978784, or unsubscribe https://github.com/notifications/unsubscribe-auth/AATNXMNQQTTOWJIQSZNZV53TC4I5XANCNFSM4QPYDH3Q .
Well, in ixit master or that, mesa master is (still) fine... if patched 🤣
Well, that is very weird because as I said, we should be sending the data requested by the draw call everytime...
So it should have been equivalent in terms of data seen by the GPU for its draw call.
I see two possibilities: . The tracking of regions needed by the draw call is still broken . The previous code caused a wait on the GPU when the SYSTEMMEM buffers were locked. Maybe this behaviour fixes sync issues with other pool types.
Alright, best performance as of yet for Halo with this new branch: https://github.com/iXit/Mesa-3D/tree/nine_dynamic_systemmem_alternative
I gave up the initial approach of having systemmem locks not dirty the whole buffer (Intel behaviour). But on the other hand, for SYSTEMMEM DYNAMIC, I try to aggressively produce DISCARD/NOOVERWRITE patterns to fill my GPU version of the resource, while avoiding to DISCARD too much. For example, the vertex buffer used for the smoke effect is read out of order, but still generates one discard + many NOOVERWRITE.
Well, still artifacts... well, i think i will upload trace of JUJU - UE3 engine game that shows artifacts, so you can check. 🤣
That dropbox keeps losing connection on uploading a bit larger files, plus mine upload speed isnt great at all, so this took a while... anyway, here is you artifact check from gdrive 🤣 https://drive.google.com/file/d/1GViu6ozLqZ5MhIhEGxQCqRDBArjJZOBV/view?usp=sharing
It looks like it's not SYSTEMMEM management but rather the patch st/nine: Optimize DrawPrimitiveUp which has issues
Does the branch work now ?
Looks fine now, no artifacts in halo, juju, neither in cases with ddraw, d3d8...
What about performance ? Is it as good as the best you achieved with your hacks ?
Not completely, it is good for Blood Rayne which now perform fine... but that was read_write and only what is still so slow (in comparison to windows) is pipe_map_write. Do you want trace the some kind of worst case scenario of that?
I don't understand what you are talking about relative to read_write and pipe_map_write. and how fps compares to windows ? Yes if you have a worst case scenario, a trace can help checking the pattern is ok on the affected scene.
Trace of that become like 6.1 GB big and just by entering a menu there, will be mission impossible... would probably try to do it on Windows to see what happens. Everything is and always was slow there on Linux, while on Windows it is for: native d3d8 doing 36 fps d3d8to9 do 26 fps (we should be somewhere there, my hack do 20 fps) And what happens with that on Linux: wined3d doing 4 fps nine 1-2 fps dxvk 0.5 (half of one) fps. Ultimate slowness there 🤣
Then give me the log with "csmt_force=0 NINE_DEBUG=vbuf,device" set. Please kill the game (alt-f4) in the affected scene.
No worries, there will be a trace in the next couple hours 🤣 compressed seems will be much smaller, still compressing... wondering if 7z or xz are better to compress this trace.
In the end when I run the trace I'll run exactly the trace with these options to get the log. So if the log is much smaller to send, it's a win-win
There you go: https://drive.google.com/file/d/1V_JT0FAPUejBMTXPXzakSBJfoMETEAgd/view?usp=sharing Warning! 6.1 GB trace in it 🤣 Description: it goes from normal performance (logos) to molasses. Ignore logos, after logos it starts. Performance isnt fine already after logos, goes to worse once menu pops up and the worst case scenario by the end and deeper in the menu.
Whao so game is doing lock with flags 0, locking the full buffer, of a buffer in D3DPOOL_DEFAULT. it's doing lock/unlock draw lock/unlock draw. The draw uses the same piece of the buffer every draw.
Upon seeing that, I began thinking "I don't do dark magic, this is never going to be fast no matter what, unless per game workaround". Then I had an idea. And check the beginning of the log and saw "nine:device9:ctor: Application asked full Software Vertex Processing" and that explained it all.
I think I'm going to put the DEFAULT pool in SYSTEMMEM when full software vertex processing is requested. With my patchset the performance in the use pattern won't be optimal, but it should still be very good.
Alright, there you go. Branch https://github.com/iXit/Mesa-3D/tree/nine_dynamic_systemmem_alternative updated.
And now that is mine speed, just by default and how it should be. 🤣 BTW, csmt off goes on this one for full speed.
I'm not surprised csmt off might give a small boost on this one. How is the fps ?
The frame rate drops severely in Halo CE (2001) when I enable "Particles" in the video settings menu when using wine-nine. Particles are generated when there is an in-game explosion, and when an explosion occurs there is a sudden drop in frame rate. There is no drop in frame rate when an explosion occurs if I disable the Particles setting, a slight drop if I set Particles to Low, and a large drop if I set it to High.
This issue only occurs in wine-nine, regular wine does not have a severe frame rate drop when there are particles. It occurs regardless of other visual settings or screen resolution. There is no throttling taking place when this occurs (CPU and GPU power management disabled).
The game runs much faster in wine-nine than under regular wine, especially on large custom maps. I normally play the game with fps capped at 60 using Chimera, and can reach 20fps or even lower when many explosions are present. Otherwise, the game runs at a smooth 60 even on large custom maps (something regular wine can't do).
Unfortunately I am unable to do an apitrace for Halo CE, the trace plays back with a black screen and buffer overflow errors. The exact issue has already been reported to apitrace developers in 2016, and has not been fixed.
Log:
My system runs Ubuntu 18.04, with wine 5.10 staging and the latest 0.7 wine-nine. GPU is an AMD HD6670 (Mesa 20.0.8, Radeon driver). This hardware would have no problem playing Halo CE maxed out on Windows.
I'm happy to provide additional info if needed. Thanks!