hrydgard / ppsspp

A PSP emulator for Android, Windows, Mac and Linux, written in C++. Want to contribute? Join us on Discord at https://discord.gg/5NJB6dD or just send pull requests / issues. For discussion use the forums at forums.ppsspp.org.
https://www.ppsspp.org
Other
10.8k stars 2.12k forks source link

Lens flare effects #15923

Open hrydgard opened 1 year ago

hrydgard commented 1 year ago

Lens flare issues, categorized:

CPU peeking into the depth buffer to check coverage

Framebuffer->CLUT tricks

Framebuffer alpha accumulation tricks:

Not yet investigated in detail:

References:

https://github.com/hrydgard/ppsspp/commits/c3bb9437669a4a (old PR for framebuffer CLUTs)

Lens flares are a typical problematic effect on GPUs of the PSP's generation. They are supposed to be drawn only when the sun (or other light source) is visible, but there are no occlusion queries you can use to figure out if it is directly on the GPU, neither is it practical to copy the texel to an image and then use multitexturing to blend the lens effect texture with the copied texel, since multitexturing is not a thing.

So games make use of a variety of dirty tricks.

Let's start with Wipeout Pure, #13344. I started by hacking the interpreter to log out CPU reads from VRAM. For some reason there are a whole bunch that happen every frame, but these stand out:

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdf0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdf4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471bdf8

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5ec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5f0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5f4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471c5f8

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdf0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdf4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471cdf8

06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5ec
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5f0
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5f4
06:50:095 idle0        W[G3D]: MIPS\MIPSInt.cpp:411 Read from depth buffer? addr=0471d5f8

!!! Observation: These are cached addresses, so the game must be doing a cache invalidate at this location, maybe interesting to catch.

In the EUR version of Wipeout, the lhu instruction doing these reads is at 0888c16c (function starting at 0888C0A8), then there are some additional reads being done by 0881e0c0 (function starting at 0881E098, no idea what it's doing).

It's using lhu instructions (load 16-bit) and it looks to me like it's sampling a 4x4 rectangle around the sun's screen position from the depth buffer, skipping every other pixel - it is situated at 110000 in VRAM which starts at 04000000, plus the 600000 deswizzle offset that's needed to linearize the depth buffer in 8888 mode. A zero value it treats as sky, that is, sun is not occluded and it will draw the lense flare. As expected, as the sun slides across the image when the camera moves, these addresses, which are read from every frame, change accordingly. The game must be synchronized here since the depth buffer is not double-buffered.

For this to work correctly, we have to read back the depth buffer every frame to emulated PSP VRAM, which introduces a massive sync point between the GPU and CPU. This is not really desirable (although we should implement it as an option), so I've been thinking about ways to get around it:

Anyway, I think the first step will be to create the correct-ish but slow solution of doing hard-synced readbacks to PSP VRAM. The question is when exactly in the frame we should do these. "When finished rendering the main depth buffer" is presumably the best option, but there's no clear way to detect that. Maybe just do it when the main framebuffer is displayed, or something.

.... To be continued

ghost commented 1 year ago

Artic Edge https://github.com/hrydgard/ppsspp/issues/11100#issuecomment-1123170383 NFS Shift https://github.com/hrydgard/ppsspp/issues/11100#issuecomment-1123377372

hrydgard commented 1 year ago

Thanks, added to the list.

QrTa commented 1 year ago

Syphon Filter Logan's Shadow https://github.com/hrydgard/ppsspp/issues/10229#issuecomment-1232151181

71knight commented 1 year ago

I don't know how aethersx2 PS2 emulator does it, but they get accurate full speed readback emulation using opengl on Android. I can play need for speed hot pursuit 2 without underclocking the emulator and have accurate readbacks turned on and maintain full speed emulation with no slowdowns and the sun hides when it's supposed to. And I think the PS2 has double the resolution of PSP. I do cheat a little though... I keep all my cores frequencies maxed out and GPU set at 3/4 speed on my rooted phone (SD 855+). The phone doesn't get too hot.... About 147° F on average. So I know ppsspp wouldn't be too demanding with accurate readbacks. I think what helps them is they have CPU affinity option that keeps the heaviest threads on the biggest cores of the phone.

ghost commented 1 year ago

Aethersx2 is the pcsx2 mobile port btw.

hrydgard commented 1 year ago

An alternative to readbacks for the games that peek the Z-buffer using the CPU, as commented elsewhere by @unknownbrackets , would be to run both the software and hardware renderers side by side, that way we'll always have accurate depth in CPU-accessible memory, at the right time.

This is expensive though, and to make it less so, it would be possible to have the software renderer only render depth buffers, and just ignore color - depth is a lot less complex so I think this would be way faster than running the full software renderer. This wouldn't work for cases where games reinterpret color and Z like Kuroyou, but I don't think that applies to any of these cases.

Also gonna have to look into what PCSX2 does. Maybe SX2 Aether does something special on top, hard to say given it's close source..

unknownbrackets commented 1 year ago

I will say, the loop to interpolate triangle data is the slowest part of the software renderer now, I think. Texture sampling is still fairly slow as well.

We'd still need to texture (because of alpha tests/color tests), but we could skip alpha blend and logicops. Skipping blending would save time, but I don't think it'd make a huge difference overall.

Maybe we could have a "fast and loose" mode where it ignores color and alpha tests, though, or at least skip sampling/etc. when they're not enabled (which would be safe.) That would also allow us to skip lighting which is quite expensive.

-[Unknown]

hrydgard commented 1 year ago

Yeah I think we can go very fast and loose for Z-only. Texturing only needs to be done when we know there's alpha. And we could skip filtering and mipmapping for example..

unknownbrackets commented 1 year ago

Right. My biggest concern would be "depth boxes" from alpha testing. For example, if some far away trees or clouds were drawn to cover the sun, but without alpha testing they cover the entire thing. If we can safely skip alpha testing, it probably helps the potential speed a lot, because it cuts out many, many things.

We might end up in a place where we're using heuristics to skip alpha testing, though. For example, it's probably mainly an issue with flat Z - models probably don't need alpha testing for depth to be correct.

-[Unknown]

ghost commented 1 year ago

Socom US Navy Seal: Tactical Strike is also affected. https://github.com/hrydgard/ppsspp/issues/15071 Screenshot_2022-10-13-02-23-08-77

UCUS98649.ppdmp.zip

hrydgard commented 1 year ago

Thanks, added to list.

ghost commented 1 year ago

Burnout Dominator sun flares is glitchy using the recently build PPSSPP. Screenshot_20230204_194908_2f85358b2198d26f8aca533d68bee793 ULUS10236.zip

hrydgard commented 1 year ago

Yeah, I'll have to take a look at those again.

ghost commented 1 year ago

Resistance Retribution

Software Screenshot_20230318_065453_2f85358b2198d26f8aca533d68bee793

Vulkan/OpenGL Screenshot_20230318_065718_2f85358b2198d26f8aca533d68bee793

GE Dump UCES01184.ppdmp.zip

Edit: fixes by [ReadbackDepth] compatibility but makes the game slower and make my opponent invisible :(