hrydgard / ppsspp

A PSP emulator for Android, Windows, Mac and Linux, written in C++. Want to contribute? Join us on Discord at https://discord.gg/5NJB6dD or just send pull requests / issues. For discussion use the forums at forums.ppsspp.org.
https://www.ppsspp.org
Other
11.23k stars 2.17k forks source link

(Vulkan) Lack of Vsync and severe stuttering during Texture Scaling #10105

Closed DonelBueno closed 4 years ago

DonelBueno commented 6 years ago

These are the remaining generalized issues on Vulkan for me.

Lack of Vsync is self-explanatory, there is screen tearing even with vsync turned on.

The stuttering with texture scaling is MUCH worse in Vulkan than in any other backend, I don't know why. It happens even with Hybrid at 2x.

hrydgard commented 6 years ago

Lack of vsync is backend-dependent. Which ones have problems? I certainly have vsync in Vulkan for example.

Don't know what's up with the texture scaling in Vulkan, needs to be looked into. But ideally it should be moved to the GPU anyway.

unknownbrackets commented 6 years ago

Pretty sure you can turn on or off vsync in your driver, and this will likely affect Vulkan.

Texture scaling on Vulkan - currently - scales even empty or flat textures. They're still cached, but in GLES these are skipped.

It'd require reordering some logic (because of how it allocates) to make it able skip scaling empty/flat textures, I bet we could make IsEmptyOrFlat faster, and potentially even use it to send a 1x1 texture, and call it earlier.

That's the main differences I'm aware of.

-[Unknown]

DonelBueno commented 6 years ago

I only get tearing with Vulkan, all the other backends are fine (OpenGL, D3D11, D3D9).

My Vsync driver configuration is "application controlled". I know I can force "Vsync on" via driver, but it's much better to control it by the application. And I wouldn't be able to get past 60 fps that way, even with unthrottle.

hrydgard commented 6 years ago

We can choose between multiple present modes in Vulkan. On Windows we currently use MAILBOX which lets us submit frames as fast as we can (to support non-frame-skipped unthrottle) but should vsync but maybe doesn't always. There's also FIFO which we use on Android, which probably guarantees better results but requires us to skip frames during unthrottle. We could use the vsync checkbox to select between those.

hrydgard commented 6 years ago

Try this again now.

DonelBueno commented 6 years ago

Well, I think it has improved, but it's still quite behind D3D11 and OpenGL.

As for Vsync, it hasn't changed, as expected.

hrydgard commented 6 years ago

I don't see this at all, plenty fast for me. Strange.

hrydgard commented 6 years ago

DonelBueno, what GPU do you have? Also, that little change could maybe help if it's not nvidia...

DonelBueno commented 6 years ago

GTX 770, so no, that change didn't help.

It still stutters heavily in some cases, strange.

hrydgard commented 6 years ago

What cases?

ghost commented 6 years ago

I'm using Windows 7, GTX 970, i5, all PPSSPP settings default. I get constant stuttering in project Diva 2nd, Project Diva extend, Castlevania the Dracula X chronicles. Even with the resolution at native, it still stutters slightly.

I should also mention the stuttering is present in all backends. I don't think it's a Vulkan only problem.

hrydgard commented 6 years ago

Doesn't matter how much I crank up texture scaling, I don't get any terrible stutters in any game I try.

GTX 770, i7-3770K 3.5GHz, Windows 10, Vulkan.

ghost commented 6 years ago

Just tried the very latest build. The stuttering is greatly reduced, but still present.

unknownbrackets commented 6 years ago

Hmm. None of the scaling methods will write memory sequentially, which is not great for discrete GPUs.

It may be worth testing the performance on integrated memory GPUs, since I doubt it cares much about sequential access, but it may. Then we could add a flag (maybe assume mobile/desktop except on Vulkan where we can detect?) to force a temp buffer when scaling, rather than scaling directly?

With the exception of swizzled textures (common) and DXT (uncommon), we decode a lot of textures sequentially. Might be worth benching to see if it's better to use a temp buffer in the non-sequential cases.

But for texture scaling perf issues that happen on OpenGL (where we always use a temp buffer currently), this won't help. I think there the issue is simple: texture scaling takes time. Until someone invents magic, somebody else's problem field generators, or an spectacular texture scaling algorithm that looks amazing and is super fast... I guess that problem ain't going away.

Best workaround in those cases is to try creating a HD texture pack, I suppose.

-[Unknown]

hrydgard commented 6 years ago

Write sequentially? That doesn't really matter too much, we write textures (at least in Vulkan) to a cached region of memory that's also mapped into the GPU's memory space (pushbuffer), then have the GPU copy it out of there into local vram, so performance of the upload itself should be reasonably good regardless.

What we really should do is to copy the original size texture into this buffer instead, then have a compute shader perform the actual scaling. This will be way faster than running the scaling algorithm on the CPU, and it can write directly into the texture's storage.

unknownbrackets commented 6 years ago

I could be wrong, but I thought coherent memory was uncached, and therefore suffered from random access. I thought #10108 improved Vulkan more than say OpenGL in part because of uncached memory.

We write textures directly to mapped coherent push buffer memory, right?

-[Unknown]

hrydgard commented 6 years ago

Oh, right, confused myself a little. Example of available memory types: http://vulkan.gpuinfo.org/displayreport.php?id=2223#memory

Yeah, we use coherent and not cached memory - we could also used cached but then we'd have to manually vkFlushCachedMemoryRanges, I think.

Either way, a compute shader would spank whatever we are doing. We could also unswizzle and depalettize in a compute shader first, for even less memory copying from the CPU side. Of course could also be done in a pixel shader but compute shaders have much less launch overhead than renderpasses so more suitable for texture uploads, of which there might be many in a frame.

unknownbrackets commented 6 years ago

Hmm, maybe it's worth preferring cached and coherent: http://vulkan.gpuinfo.org/displayreport.php?id=2202#memory

I don't have a good device to test coherent vs non-coherent cached. But my desktop GPU does have coherent and cached coherent options. Interestingly, using cached memory generally slowed things slightly (even though it was still coherent): https://github.com/hrydgard/ppsspp/compare/master...unknownbrackets:vulkan-mem?expand=1

Maybe we actually want to avoid cached memory (since we currently go by order, we might accidentally select cached memory unintentionally - for example on the Adreno 530 linked above.)

Agreed that it'd be better to perform upscaling on the GPU.

-[Unknown]

hrydgard commented 5 years ago

Digging up this old thread, regarding coherent vs cached, I've learned that for writes from the CPU it doesn't really matter much if it's cached or not since write combining is generally performed and in practice mostly whole cachelines are written out regardless when you write large contiguous chunks of data like images. For uploads to the GPU we should thus prefer COHERENT to CACHED indeed.

Reads are a whole other matter though, for those CACHED can be very beneficial.

hrydgard commented 5 years ago

Anyway regarding the actual issue here, we have two conflated ones, VSYNC and supposed extra stuttering when upscaling. Still no idea about the latter, does it still appear to be an issue?

Regarding VSYNC we already use a mode that's supposed to be synced so I don't know what more we can do.

Narugakuruga commented 5 years ago

Stutter still happens in 1.7.5.415. i5-8400 GTX 1050 Ti Windows 10

Monster Hunter Portable 3rd HD ver, texture scaling x5 xBRZ. Vulkan : When goes in the village and move the character immediately, a stutter occurs. I'm very sure it's a stutter because both audio and graphics are paused for a short time. D3D11 : When goes in the village and move the character immediately, no stutter at all. OpenGL : When goes in the village and move the character immediately, there may be stutter, but absolutely not that obvious as Vulkan because the audio is smooth.

hrydgard commented 5 years ago

Since the CPU cost of texture scaling should be the same, something else has to be different. Probably D3D11 has preallocated space while Vulkan runs out of buffers and has to go to the OS to get more with vkAllocateMemory. We might be able to fix this by increasing the size of our allocations when 5x texture scaling is used... but I'm putting that off for after the 1.8.0 release.

Narugakuruga commented 5 years ago

About VSync, here are some clues. There are 4 kinds of VSync in NVIDIA : "On", "Adaptive", "Adaptive (Half refresh rate)", "Fast". Test on 1.7.5.415 with Cube test program. On : Nothing special, works properly on PPSSPP Vulkan. Adaptive : At first works. But after I went to graphics setting and turned back(not on purpose), the screen tears. Adaptive (Half refresh rate) : Nothing special, works properly on PPSSPP Vulkan. Fast : Completely doesn't work on PPSSPP Vulkan.

Narugakuruga commented 5 years ago

By the way, Cemu emulator had just fixed their broken VSync option (though they use OpenGL), Dolphin vulkan VSync also works properly on my NVIDIA. I think there must be something wrong. Hope you can fix this small issue so I can continue to feel good about buying Gold.

DonelBueno commented 5 years ago

This is still actual (both problems). Now I'm using a GTX 1070 and drivers 430.53.

hrydgard commented 5 years ago

Lack of vsync will be fixed in the next NV drivers if it isn't already. As for scaling stutter, yeah it's gonna be like that until we implement GPU texture upscaling. It will happen eventually...

unknownbrackets commented 4 years ago

Has this improved?

-[Unknown]

Narugakuruga commented 4 years ago

Confirmed, NVIDIA had fixed this

unknownbrackets commented 4 years ago

Does that mean we can close this, or is texture scaling still performing poorly?

-[Unknown]

Narugakuruga commented 4 years ago

I believe not. I run some simple test using the frametime graph. The result shows OpenGL has the best performance when handling texture scaling, while Vulkan and D3D11 have lower performance (especially Vulkan). I use 5x tex scale, xBRZ, to test Monster Hunter Portable 3rd HD, where I enter the same area with large amount of textures. I recorded the frametime graph (the recording itself had no influence on frametime)

批注 2020-03-07 113241 (OpenGL) The frametime almost instantly stabilizes after loading into the area.

批注 2020-03-07 113530 (Vulkan) The frametime is in an unstable pattern after loading into the area.

批注 2020-03-07 113602 But stabilizes after a few seconds.

批注 2020-03-07 113403 (D3D11) The frametime is unstable too but some better than Vulkan.

Narugakuruga commented 4 years ago

Another test: Reducing tex scale factor to 2x. This help lower the fluctuations of frametime in all backends, but Vulkan and D3D11 still have heavy stuttering.

unknownbrackets commented 4 years ago

For me, if I change:

            scaler.ScaleAlways((u32 *)writePtr, pixelData, fmt, w, h, scaleFactor);

To:

            uint8_t *rearrange = (uint8_t *)AllocateAlignedMemory(w * scaleFactor * h * scaleFactor * 4, 16);
            scaler.ScaleAlways((u32 *)rearrange, pixelData, fmt, w, h, scaleFactor);
            memcpy(writePtr, rearrange, w * h * 4);
            FreeAlignedMemory(rearrange);

The speed is improved to near OpenGL speeds. To me, this indicates it is definitely something about the memory it's writing to.

Adding VK_MEMORY_PROPERTY_HOST_CACHED_BIT to the texture push buffer only also significantly improved speed and was comparable to OpenGL speeds. The scaling code actually reads from the output buffer during its scaling, which is why.

For me, Direct3D 11 is comparable in speed to OpenGL already (even with the mapRowPitch part.)

-[Unknown]