simias / rustation

Playstation emulator in the Rust programing language
Other
555 stars 20 forks source link

Specification of the OpenGL renderer architecture #10

Open simias opened 9 years ago

simias commented 9 years ago

Overview of the PlayStation GPU

GPU Rasterizer

The GPU uses 1 megabyte of video RAM organized as a framebuffer of 512 lines of 2048 bytes. The CPU can upload textures to this buffer using GPU commands (it's not directly memory mapped in the CPU address space), it can also read back portions of the framebuffer using other commands.

The GPU also contains a relatively simple 2D rasterizer capable of drawing lines and triangles (and "quads" which are really just two triangles side-by-side). It supports solid colors, textures (truecolor and paletted), several transparency modes and gouraud shading. It can also apply a dithering pattern before outputing the 16bit color. The GPU has a small texture cache to speed up rendering of textured primitives.

The rasterizer always outputs 16bits per pixel (1555 RGB, where the MSB is the "mask" bit) so as far as it's concerned the VRAM is a framebuffer of 1024x512 pixels. Therefore all the draw commands use this system of coordinates.

Note that the GPU is fully 2D, it's not capable of 3D projection and therefore has no notion of depth (so no depth buffer or anything like that). The PlayStation does 3D projection on the CPU using the Geometry Transform Engine (GTE) coprocessor. That means for instance that the GPU cannot do perspective-correct texture mapping which is the source of some easily recognizeable PlayStation graphical artifacts.

The coordinate system used by the GPU is simply the 16bit per pixel coordinate in the video RAM, so (0, 0) is the top-left of the framebuffer while (1023, 511) is the bottom right. You can see a list of the GPU draw commands in the No$ specs.

GPU video output

Once a scene has been rendered/uploaded to the framebuffer it needs to be displayed on the TV through the NTSC/PAL analog video output. In order to do this the GPU video output can be configured to select a rectangle in the framebuffer and stream it to the TV.

The size of this window depends on the video timings used. For NTSC it ranges from roughly 256x240 to 640x480 while for PAL it's from 256x288 to 640x576. I say roughly because since it's an analog output you can tweak the timings in many ways so you can actually "overscan" the output to increase the resolution or crop it furthermore depending on what you're trying to do.

Interestingly even though the rasterizer only outputs 16bits per pixel the video output can be configured to use 24bits per pixel. That's of course mostly useless for graphics generated by the rasterizer but it can be used to display pre-rendered 24bit images, for instance videos decoded with the console's MDEC and uploaded on the GPU VRAM. An other application could be 24bit images dumped directly from the disc and used as static load screens.

Design of the emulated OpenGL renderer

Features

First and foremost I think accuracy should be the main focus. Of course if it was the only objective a software renderer would be better suited but I think with modern OpenGL and its programable pipeline it should be possible to reach a decent level of accuracy (except maybe for the GPU cache, see below).

OpenGL would also make it easier to implement certain enhancements to the game's graphics compared to the original console, for instance increased internal resolution, texture replacement, normal maps etc...

Later on we could even attempt to salvage the raw 3D coordinates from the GTE and use them to render the 3D scene directly with OpenGL. That would allow us to have higher precision for our vertex coordinates, perspective correct mapping and many other things only possible with a fully 3D scene.

I think it's important to keep those features in mind when designed the basic renderer architecture so that we don't end up breaking everything when we try to implement one of them.

Potential difficulties

As you can see from the previous sections the PlayStation GPU is extremely simple compared to a modern graphic card, however it features some quirks and "exotic" modes which don't fit the OpenGL pipeline very well as far as I can tell.

Textures and palettes

At first I thought the most obvious approach to emulate the GPU video memory would be to use a single 1024x512 texture/FBO (or bigger if we increase the internal resolution) and use it as our Video RAM. There are a few potential issues with that approach however.

Upscaling and filtering

When a game wants to upload textures and palettes to the video RAM it must use one of the "Copy Rectangle" commands saying where the data should end up in the framebuffer (always using the same 1024x512 coordinate system) and then send the data 32bits at a time.

At this point there's no easy way to know what the data contains, it can be a 24bit RGB image from a video, it could be a 16bit "truecolor" texture, it can be a paletted texture, it can be a palette or several of those things at once. We'll only know how to interpret the data when the GPU actually uses it, either through a textured draw command or by the video output configuration if it's just meant to be dumped on the TV screen without further processing.

This seriously limits what we can do with the raw framebuffer data if we don't want to break anything.

For instance if we have a single big FBO representing the entire framebuffer at an increased resolution (say, 2048x1024). When the CPU attempts to upload data to the GPU we could upscale and optionally filter it and store it in our big buffer. Easy.

Except of course upscaling and filtering palettes and paletted textures won't work as intended, the intermediate pixel values will be meaningless since we basically have a non-linear color space. We cannot risk destroying palettes, therefore we can't really mess with the uploaded data until we know what it's going to be used for. Or at least whatever we do must be reversible if we need to go back to the original value later on.

I'm not sure what's the best way to deal with this. Maybe we could have two framebuffers instead: one at the native 1024x512 resolution containing the raw framebuffer data with no fancy enhancements that would be used for paletted textures and a bigger framebuffer containing the rasterizer's output at increased resolution. Keeping the two coherent will be a challenge however and I don't know where 24bit images fit in there. Maybe we could use a completely different rendering mode when the video output is set to 24bit mode so that we can ignore it the rest of the time.

If we want to implement texture replacement we also need to figure out when it should take place. That's a complex subject however, maybe we can leave it for later.

OpenGL texture sampling

An other potential issue if we use a single texture/FBO for the entire video RAM is that we need to be able to render into it while we sample a texture in an other location in the same buffer. So we would be rendering to an FBO while it's also bound as a texture.

As far as I know this kind of configuration is not well supported by OpenGL and can quickly lead us into undefined behavirour territory.

I believe that this should be achievable using the GL_ARB_texture_barrier extension which is part of OpenGL 4.5 but maybe we can work around it.

Otherwise we could maybe use two framebuffers and "Ping pong" between the two between each frame instead, this way we would write to the current FBO while we use the previous one for input. That could be innacurate if a game decide to use a polygon rendered during the current frame to texture a subsequent one, I know some games use similar features to create some fancy visual effects.

Semi-transparency

The PlayStation GPU rasterizer has several pixel blending modes used for semi-transparent primitives (copied from the No$ specs):

  B=Back  (the old pixel read from the image in the frame buffer)
  F=Front (the new halftransparent pixel)
  * 0.5 x B + 0.5 x F    ;aka B/2+F/2
  * 1.0 x B + 1.0 x F    ;aka B+F
  * 1.0 x B - 1.0 x F    ;aka B-F
  * 1.0 x B +0.25 x F    ;aka B+F/4

Unfortunately I don't think the OpenGL blending fixed function is flexible enough to accomadate all these modes without a significant number of hacks. Besides for accuracy's sake we might want to handle the blending calculations in our own shader code to make sure we don't have weird rounding and saturation discrepancies (if we want to be bit accurate with the real hardware).

For this reason I think it would be better to handle the blending in the fragment shader. Once again this is not generally how things are done in OpenGL as far as I know, but it should be possible at least using the same OpenGL 4.5 GL_ARB_texture_barrier extension mentioned before or by "ping ponging" the buffers.

Masking

The MSB of the 16bit pixel is used to store a "mask" field. When the GPU renders a primitive it can be configured to set this bit to either zero or one. An other configuration flag can tell the GPU to treat framebuffer pixels with the mask bit set as "read only" and refuse to overwrite them. It effectively works like a simplified stencil test in OpenGL.

The problem if we decide to use a stencil buffer to emulate this masking feature is that we'd potentially need to update the stencil buffer after each primitive, since it's possible for any primitive draw to set the mask bit if the GPU is configured to do so. I don't know if it's possible to meaningfully modify the stencil buffer in an OpenGL fragment shader. Barring that we won't be able to use the stencil test to accurately emulate the masking.

Alternatively we could use the same trick I proposed to handle the semitransparency modes above: we fetch the target FBO pixel in the fragment shader and if the masking is enabled and its MSB is set we don't change its value. If we already handle transparency that way it might be relatively straightforward to add, theoretically.

Video output

So far I only talked about rending things inside the framebuffer, we also have to implement the video output to display the relevant part of the framebuffer on the screen.

The PlayStation video output streams the pixels from the framebuffer directly on the screen without any kind of intermediate buffer. In other words the display is "continuous", if a game wants to implement double buffering it does it by rendering to two different parts of the framebuffer and swapping the video output source position and GPU rasterizer draw offset when it wants to "flip" the buffers.

In order to emulate this properly we'd have to basically render one pixel at a time, keeping track of the output video pixel clock. I don't think that can be done very efficiently in OpenGL, nor does it sound necessary to emulate correctly the vast majority of games.

Instead we could display the entire visible portion of the framebuffer at once at the end of every frame or the beginning of the next one maybe. If the game uses double buffering we want to it right after it swaps its buffers to reduce latency. I'm guessing the swapping would take generally take place during the vertical blanking to reduce tearing artifacts, so maybe we could do the frame rendering at the end of the vertical blanking. I need to do more tests to make sure that's really how it works in practice though.

24bit mode

As explained before while the PlayStation rasterizer can only output 16bit RGB to the framebuffer the video output is capable of treating it as 24bits per pixel to display pre-rendered graphics.

When displaying in 24bit mode we'll have to find a way to take our 16bit RGB framebuffer and display it as a 24bit image. I don't know how difficult that is with OpenGL. I guess at worse we could sample two 16bit pixels in the fragment shader and reconstitute the correct 24bit value there. We could also implement 24bit image upscaling and filtering there to avoid having to handle it in the 16bit code. After all 24bit mode is a very limited special case on the original hardware.

Texture cache

The PlayStation GPU has a small texture cache used to speed up the rendering of textured primitives. Normally it only affects the speed at which a primitive is rendered, however if the cache becomes "dirty" (for instance by overwriting a texture while some of it is in the cache) it could potentially change the aspect of the resulting triangle.

I have no idea how to emulate this particular feature in OpenGL. As far as I can tell the only way to emulate it accurately would be to draw each primitive pixel-by-pixel, updating the cache state in the process but that goes against the spirit of the massively parallel GPUs we have today.

Fortunately I believe that for the vast majority of the games we can completely ignore this cache, so maybe we can ignore it or at least put it very low in our priority list.

ADormant commented 9 years ago

https://github.com/simias/rustation/pull/9

MagaTailor commented 9 years ago

I was about to open an issue about OpenGL versions supported - I can't get rustation to work on x86 with OpenGL 1.x, nor on arm (OpenGL 1.x via glshim) or even using Mesa (OpenGL 2.0).

Does this mean OpenGL 3 is mandatory?

ADormant commented 9 years ago

@petevine Currently yes but you can use Pete's OpenGL1 and 2 plugins however such ancient opengl versions are not good for accuracy.

simias commented 9 years ago

Yeah, I'm still not sure which version of OpenGL we'll end up targeting but OpenGL < 3 seem out of the question. It's simply lacking too many features to emulate what we need accurately.

Maybe a very accurate software renderer (based on mednafen's code?) would be nice to have at some point though.

ADormant commented 9 years ago

MaskBit can be emulated by combination of Stencil Buffer and Destination alpha http://arek.bdmonkeys.net/SW/pcsx/ https://github.com/devmiyax/yabause/commit/c03e9033d2e124001e926da6a6fa8519987f3131 @simias By the way can output be configured to display 32bit per pixel for images and pre-rendered graphics instead of 24bit?

MagaTailor commented 9 years ago

A pity though, even some not too ancient x86_64 laptops have just hardware OpenGL 2.x.

On a tangent, I've noticed some completely trivial programmes like minesweeper default to v3 but that's probably down to the commonly used rust GL libs.

ADormant commented 9 years ago

It seems ti's possible to modify stencil buffer. https://en.wikibooks.org/wiki/OpenGL_Programming/Stencil_buffer https://www.opengl.org/registry/specs/ARB/shader_stencil_export.txt https://www.opengl.org/registry/specs/AMD/shader_stencil_value_export.txt

simias commented 9 years ago

I honestly didn't know OpenGL 3 and later was still so poorly supported. I mean, 3.0 was released in 2008, 3.3 in 2010...

I expected that since rustation has pretty high CPU requirements due to the lack of dynarec OpenGL 3.3 would be a good "baseline" to target. Looks like I was wrong...

@ADormant no I believe there's only 16 (really 15 since the MSB is the mask bit) and 24bit output on the original console. But the more I think about it the more I think 24bit handling should be a special case, since it can only be used to display pre-rendered images increasing the internal resolution would be pretty pointless. In 24bit mode we could just take the images at the original resolution and upscale/filter them before displaying them, like it's done with 2D consoles.

And regarding the stencil you just made me discover glStencilOp, indeed it seems like the right way to emulate the mask bit without shader trickery.

Alternatively if we're sampling the destination pixel for "manual blending" we could have a stencil buffer always set completely to 0 and a glStencilFunc set to test GL_EQUAL 0. In the fragment shader we extract the max bit from the target pixel and use it to set gl_FragStencilRefARB which is used for the stencil test. The advantage would be that we won't have to ever touch the stencil buffer or worry to maintain coherence between the pixel's MSB and the stencil buffer (which might be important if the CPU wants to read back a portion of the framebuffer and expects the mask bits to be set properly). Not sure if that's a good idea though.

ADormant commented 9 years ago

Do you mean 24 bit for pre-rendered backgrounds only or everything because I believe pete's plugins can use 32bit color depth. Probably emulating it through shaders would be more accurate or maybe the best option would combination of stencil buffering and shader blending?

simias commented 9 years ago

Well, I assume that 32bit color depth is really 24bit RGB + 8bit alpha or something similar, 8bit per component (~16.8 million colors). Few computer monitors are able to display more than that anyway, it sounds completely overkill for PSX graphics.

That being said as far as the emulated renderer is concerned it's possible to increase the color depth arbitrarily, although you'll always be limited by the color depth of the original textures unless you replace them. That's probably what pete's plugin is refering to with "32bit graphics", it just means that it doesn't accurately truncates the rasterizer's colors to 16bits with dithering like the real console but outputs at a greater color depth. That results in a sharper, less noisy image (for better or worse).

For instance currently rustation's output is using whatever color depth the SDL window is using, so probably RGB888 or similar. The SCE logo gradient as displayed by rustation looks much smoother than the one displayed by a real console (or mednafen PSX) because of the increased color depth.

I think increasing the color depth has basically the same caveats as increasing the internal resolution, we need to be very careful with palettes and paletted textures but it should mostly "just work" with the rest. At least in theory...

ADormant commented 9 years ago

Definitely when bladesoft increases color depth it stops using dithering since it's no longer needed. I guess pete and bladesoft are 8 bits per channel although 10 bits per channel(1.07 Billion colors) should be doable with new especially 4k monitors too.

simias commented 9 years ago

It's unfortunately pretty common to say "32bits" when you actually mean 24bit RGB. For instance Windows' display settings used "32bit" to mean 24bit RGB until at least Windows XP.

Of course if you can increase the color depth to 24bits you can go as high as your hardware will allow, but I doubt you'll see a significant difference above 24bits for PlayStation graphics unless you replace the textures or manage to hack in things like HDR.

I think we should still support 16bit+dithering because some games tend to look very "flat" without the dithering noise IMO. Especially untextured polygons.

ADormant commented 9 years ago

The so called 32bit is a variant called RGBA color space. These plugins indeed seems to have 24bit+8 alpha https://en.wikipedia.org/wiki/Color_depth#True_color_.2824-bit.29 By the way is there any problem with implementing MSAA and SSAA in a PSX emulator?

simias commented 9 years ago

Yeah but there's no meaningful alpha on a computer screen, it's only useful for blending so it's really only 24 meaningful bits.

I think multi/supersampling is analogous to increasing the internal resolution, if one works the others should work as well, at least as far as I can see.

ADormant commented 9 years ago

I wonder about hardware accelerated framebuffer emulation and framebuffer effects it that a problem like with N64 emulation? @simias Also I wonder if it's possible to emulate GTE and MDEC on a GPU? Maybe even merge them into one?

simias commented 9 years ago

I don't know the N64 very well but I believe one of the issue with hardware renderers on this console is that the video memory is shared with the CPU, therefore you have to be super careful about interactions between the CPU and GPU.

On the PSX the VRAM is dedicated to the GPU and the CPU has to go through the GPU registers to access it so the situation is very different. Basically the CPU can't go "behind your back" and mess with the framebuffer without getting notified in the GPU code. I also believe that the N64 GPU is much, much more complicated than the PSX GPU.

Emulating the GTE alone on the GPU would be tricky since it's many small atomic operations, given the cost of sending the data to the GPU and bringing it back it would probably end up being very slow and I don't see how you could batch the commands without assuming a lot of things about what the game is doing. I think it would be better to try and implement it using SIMD instructions to speed it up.

However if GTE "accuracy" (real 3D rendering on the GPU with full precision) is implemented then it should be possible to replace the GTE code with a simple placeholder since the real work (projection, shading, zbuffer etc...) will be done on the GPU. It might harm compatibility however.

Regarding the MDEC I honestly don't know but since it's not generally used "interactively" I'm not sure if it's really worth it. I don't really see what we would gain from a GPU based MDEC except slightly lower CPU consumption while playing videos.

ADormant commented 9 years ago

Well some games seem to use MDEC for textures and perhaps it'd allow for things like FMV dumping/replacement or applying shaders to FMVs.

simias commented 9 years ago

MDEC-decoded images are then uploaded to the GPU just like any other textures, the replacement/filtering could take place there without any MDEC-specific code.

Doing it at the MDEC's output would be annoying because you would have to match the original resolution otherwise it will mess up the DMA-to-RAM transfer. It's better to handle all that on the GPU side IMO.

ADormant commented 9 years ago

About drawing primitives per-pixel Saturn has something called per-pixel priority perhaps it's related? https://github.com/Yabause/yabause/commit/f04d3c5fcdf038e70848e0bac6140911d977fab4 https://github.com/Yabause/yabause/commit/b307da5386fd73950f1acb5c5824fb8c89e49e00 https://github.com/Yabause/yabause/commit/261615bfc412099f0c617e200bb0e098cf8b36e1 https://github.com/Yabause/yabause/commit/c6ffab78439de82e80cf1298b0747440c8f2a364 https://github.com/Yabause/yabause/commit/9148de79a50e3929bd4e6e88f5287d8b77606c0f https://github.com/Yabause/yabause/commit/a8049ed790761bceed05ae7b08904848c72b4e88 https://github.com/Yabause/yabause/commit/f315ccaa0e1e0d590b056ccc6d70b6ca7f545008 https://github.com/Yabause/yabause/commit/e5504923ea24f61e25ba0cef053433fa5b08fd10 https://github.com/Yabause/yabause/commit/6288e2d1429556c72eab7ff87e896f6edf2eda75 https://github.com/Yabause/yabause/commit/3bd23ad318f5e6674301134535993b313a6d5945

simias commented 9 years ago

I'm not really familiar with the Saturn but guessing from the name and those commits it seems more related to how different layers of sprites are rendered (some sprites can have higher "priority" than others and overlap them). The PlayStation doesn't even have real sprites so the concept doesn't carry over (although you could say the mask bit is kind of a priority bit).

For the rest the Saturn architecture is so different from the PSX that I don't really know if there are things from yabause that could be carried over.

ADormant commented 9 years ago

I wonder if this extension can be used to simulate z-buffer? http://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt https://github.com/mirror/pcsxr/commit/9a63e315195a0dcd669d2df1c62142eaf740de7a https://pcsxr.codeplex.com/discussions/264234 http://pcsxr.codeplex.com/SourceControl/changeset/68598 https://github.com/mirror/pcsxr/commit/5ba139a4c3cc15420762bb4a75ca36f7c936213b https://github.com/mirror/pcsxr/commit/9cbd44224d555eb5d9fbd044676e92924407b57b https://github.com/mirror/pcsxr/commit/a603b0692f07a5dc9980b9e76c7cddfe57805c12 https://github.com/mirror/pcsxr/commit/704ab8f7cc436c00c151e5608429737acb2e69ac https://github.com/mirror/pcsxr/commit/3f5662407bc39d8913a4a24927cb953e3d57b56c https://github.com/twinaphex/beetle-psx-libretro/commit/1cf16b357b7ac76a6dd81944ff8a2742fe4de177 https://github.com/twinaphex/beetle-psx-libretro/commit/116bc38466861f33c8dd3c229b140a898ad0f346 https://github.com/twinaphex/beetle-psx-libretro/commit/9fdabb31d864141ac94e1079afcdfc73f0934ea6 @simias

simias commented 9 years ago

What do you mean by zbuffer exactly? The PlayStation doesn't have a z-buffer.

Yamakaky commented 9 years ago

Just to be sure: the GPU has 1M of ram (+ cache). A (fixed?) part of it is used as a framebuffer in which the GPU renders. It then shows the framebuffer on the screen. Am I right?

simias commented 9 years ago

Mostly but nothing is really fixed. Basically the game has 1MB of video memory to hold everything: textures and framebuffers. This video memory is addressed as a 2D buffer of 1024x512, 16 bits per pixel. All the render commands (draw triangle, draw quad, draw line...) use this system of coordinates. You can give an offset to the GPU that will be added to the vertice coordinates (currently implemented as the offset uniform) and define a drawing area outside of which the GPU won't render (currently implemented with the scissor box).

Here's a VRAM dump of my real console displaying the PAL version of Crash Bandicoot:

crash-pal

Here's what it looks like on rustation today, running the japanese (NTSC) version:

crash-opengl

You can see that the game uses dual buffering so you have two framebuffers at the top of the image. Once it's done rendering one buffer it configures the video output "vram start" coordinate to display this part of the VRAM, then it changes the offset and display area to switch to the other buffer and draw the next frame.

The rest of the VRAM holds textures, most of them looks weird in this image because in order to save space many (most?) games use 4 or 8 bit paletted textures . You can see that for now I completely ignore texture uploads so this part of the VRAM remains blank in rustation.

But note that nothing here is fixed, the VRAM organization is left to the game. For instance here's a dump of the VRAM for Spyro (PAL):

spyro-pal

You can see that the developers decided to stack their framebuffers on the left of the VRAM.

One last example, Metal Gear Solid (PAL):

mgs-pal

We can see that the game uses a smaller horizontal resolution in order to stuff more textures in the VRAM. The PlayStation video output supports several horizontal resolutions by changing the speed at which the pixels are sent to the analog output (see https://github.com/simias/rustation/blob/master/src/gpu/mod.rs#L1148-L1178)

Yamakaky commented 9 years ago

Idea: instead of copying the framebuffer, why not copy the textures currently used ? It would solve the synchronisation problem between two buffers. It may not be possible to batch commands, as we have to copy each time the texture. We could use a monotonically increasing z coordinate and reorder the commands, but I'm not sure how it would play with transparency.

ADormant commented 9 years ago

@simias I got a question:Is it possible to implement a more universal widescreen hack? The current one used in PS1 emulators doesn't work very well for games with mixed 2D/3D and pre-rendered backgrounds. By the way something about implementing perspective-correct rendering is here http://problemkaputt.de/psx-spx.htm#gpumisc

widescreen correct widescreen hack bugs

simias commented 9 years ago

Heh, that does look pretty bad. The problem is how would be the right way to render this in widescreen? Scale the background and have black bars on the sides? I guess that could be doable if we could figure out a reliable heuristic to detect backgrounds.

I'm not sure how the widescreen hacks are implemented in other emulators but my first guess would be to modify the GTE to force a different aspect ratio. If that's the case then the 2D graphics that don't go through the GTE remain untouched.

Even for 3D games I'm pretty sure it won't work well 100% of the time, here's a quick hack I made by scaling the X coordinates in the GTE:

Normal "accurate" rendering:

crash

Rendering with the X coordinate scaled by 2/3 in the GTE (effectively rendering with an ultra-wide 16:8 aspect ratio):

crash-ws

You can see that the aspect ratio is definitely wider, however the game doesn't expect to render like that so we're seeing missing polygons on the sides.

I don't know if there are better ways to do this but it seems to be a common problem with widescreen hacks:

https://www.youtube.com/watch?v=BKRWonevCmM

Maybe there's a better way to do it but I can't think of any generic way to fix this. Of course if you're willing to use game-specific hacks it should be possible to trick the specific game engine to render at a different aspect ratio, but that's a whole other problem...

ADormant commented 9 years ago

By the way I wonder if trilinear and anistotropic filtering is possible without depth-buffer? Currently existing PSX emulators have problems like black outlines and boxes with even basic bilinear texture filtering and need things like multi-pass alpha blending and alpha testing to mitigate those problems. http://www.emulation64.com/guides/17/04/ http://www.emulation64.com/guides/17/06/psx-plugins-lewpy-s-glide-gpu.html/ http://www.emulation64.com/guides/17/03/psx-plugins-lewpy-s-glide-gpu.html/ http://ngemu.com/threads/alpha-multipass-in-petes-ogl-driver.2110/ http://www.fpsece.net/forum2/viewtopic.php?t=3615 https://docs.google.com/spreadsheets/d/1gsjK1WvLV4xpSD5jRupuBiM8XBnfz9C8VNcUp71HP_w/edit#gid=0 About dithering I think it should be possible to emulate it with a fragment shader but there should be an option to disable it. Some examples of bilinear filtering and transparency/multi-pass alpha blending bugs from Pete's plugins and FPSE(Texture Barrier and shader blending should be good for these): black borders black lines 2 black lines 2png black lines boxes bug outlines lines bug 2uetugi lines

@simias

simias commented 9 years ago

I'm not sure what causes those problems, I've seen things like that while upscaling textures with an alpha channel (where the alpha ends up "leaking" through the filter) but I don't see why that would be a problem on the PlayStation since there's no real alpha channel. I guess we'll have to try and fix the issues as they come up.

The tearing seen on some of your screenshots (like the Grandia start screen) could be due to rounding errors though, I've seen similar glitches when playing with weird resolutions with my current renderer.

ADormant commented 9 years ago

This is what I found about Anisotropic/Trilinear filtering and perspective-correction(which works as shown by Edgbla) for PS1. http://ngemu.com/threads/perspective-correction-whats-up-with-that.21080/ @simias

simias commented 9 years ago

Perspective correct mapping and anisotropic filtering "simply" requires getting the Z coordinates from the GTE somehow. It might be easier to implement it alongside the GTE "accuracy" hack since it's also about getting more data from the GTE to the GPU. Basically what Lewpy says in your link.

I'm starting to think that the most efficient way to do that might be to completely bypass the GTE and send the raw vertex coordinates directly to the renderer. This is easier said than done though.

There's also an interesting bit of information in your link regarding the black halos you mentioned earlier:

Bilinear filtering does not require any more information than the PSX GPU already gets passed, so it can be enabled. BUT, bilinear filtering does require carefull layout of textures in VRAM, which most PSX games do not do. This means there are glitches when enabling bilinear filtering, such as black halos (due to chroma-keying issues).

I'm not entirely sure I understand what he means by that though.

ADormant commented 9 years ago

I'm not entirely sure I understand what he means by that though.

Pete plugins have alpha multi-pass option to mitigate those filtering bugs whilst Lewpy's plugin has Alpha testing option. By the way do you know to dump and replace textures in PSX GPU? Bladesoft can already do it but has very low limit on the size of textures. @simias https://www.opengl.org/discussion_boards/showthread.php/128738-Blending-and-alpha-black-border https://www.opengl.org/discussion_boards/showthread.php/176060-Border-of-Alpha-Blended-Textures-get-black https://www.opengl.org/discussion_boards/showthread.php/172705-Blending-textures-hav-a-shadow-on-the-border-Why http://blender.stackexchange.com/questions/31420/how-to-get-rid-of-out-of-frame-black-borders-around-a-scaled-down-movie-overlay

ADormant commented 9 years ago

http://wenku.baidu.com/view/9efeb002cc1755270722080d.html

ADormant commented 9 years ago

@simias I found explanation about this whole black border problem.

http://www.razyboard.com/system/morethread-an-idea-for-texture-filtering-without-black-borders-pete_bernert-41709-1143926-0.html

http://www.razyboard.com/system/morethread-an-idea-for-texture-filtering-without-black-borders-pete_bernert-41709-1143926-10.html

http://www.razyboard.com/system/morethread-an-idea-for-texture-filtering-without-black-borders-pete_bernert-41709-1143926-20.html

http://nehe.gamedev.net/tutorial/masking/15006/

http://www.razyboard.com/system/morethread-problem-with-flickering-border-fix-pete_bernert-41709-457403-0.html

While the basic idea ("how to do something like an alpha-test without texture alpha values" ) is nice (and, of course, it has certain disadvantages), it will not help very much with the "black border" problem.

First, there are two general "border" issues when texture filtering is enabled in psx emulation:

1) the general color interpolation problem:

example:

the original (not filtered) psx texture is something like that:

BBBBBBBBB BBBBXBBBB BBBXXXBBB BBXXXXXBB BXXXXXXXB BBBBBBBBB

B=Black (in texture, transparent while drawn), X=some color

now you activate filtering, and the gfx card hardware will interpolate the texel colors while drawing:

BBBBBBBBB BBBBGBBBB BBBGXGBBB BBGXXXGBB BGGGGGGGB BBBBBBBBB

Those "G"'s are the interpolated colors of the real texture color (the one you want to see) and the surrounding Black (transparent) colors. The "G"s will not be exactly black, but in the final drawing they will appear as some kind of dark color.

2) the background masking problem

Some games, like FF7, are having "background" gfx, and some "front" gfx, for example a table (front) before a kitchen (back). The main character can walk between the table and the background scene.

How is the background scene and the table done?

background (kitchen), A=some colors, B=solid black (not transparent!)

AAAAAAAAA AAABBBAAA AAAABAAAA AAAABAAAA AAAAAAAAA

Foreground table, X=Some color, B=Black (transparent)

XXX BXB BXB

Now the table will be dran on top of the background, without filtering no problem:

AAAAAAAAA AAAXXXAAA AAAAXAAAA AAAAXAAAA AAAAAAAAA

But if you do filtering, the shapes will be not 100% correct anymore, the solid black part of the background will be interpolated in the other bkg colors... tada, a black border around the table.

And both problems will also happen with the "mask transparency" trick

simias commented 9 years ago

Ah thank you! That makes sense. If I understand correctly it's because the GPU doesn't understand that black colors are supposed to be transparent and extrapolates using them, resulting in non-transparent dark values.

Now of course we could (and probably should?) not use the basic OpenGL texture filter and implement it in the fragment shader instead. This way we could special-case black pixels in this case. It'll also let us use more clever filters although without texture cache we'll have to do it for each frame. I don't know if it's a problem for modern graphic cards given the average number of textured polygons on the PlayStation.

The problem of superposing several textures (like the table in the kitchen in his example) will need special care though because we can't have any seam between the two bitmaps otherwise the background will leak through.

The good news is that I can now kind-of start a few games so I have a bigger sample size to see how games typically use the GPU.

crash-japan

simias commented 9 years ago

An other thing I'm not sure how to handle are lines. It's the third type of primitives used by the GPU after quads and triangles. How should we draw them? If we draw them as lines what happens when we increase the internal resolution, do we just make them thicker? I wonder how the other emulators/plugins handle that.

A good way to test it could be the intro screen for MediEvil:

medievil-mednafen

The raindrops are shaded lines.

ADormant commented 9 years ago

@simias

I wonder how the other emulators/plugins handle that.

Maybe with geometry shaders like Dolphin? By the way wireframe mode can be useful for debugging . https://github.com/PCSX2/pcsx2/commit/9a2212c86ecee62235fa533e4f53734b49856536 https://github.com/dolphin-emu/dolphin/pull/1735 https://github.com/dolphin-emu/dolphin/pull/1706 https://github.com/dolphin-emu/dolphin/pull/2343 https://github.com/dolphin-emu/dolphin/pull/1735 https://github.com/dolphin-emu/dolphin/pull/1439 https://github.com/dolphin-emu/dolphin/pull/1788 https://github.com/dolphin-emu/dolphin/pull/1716 https://github.com/dolphin-emu/dolphin/pull/1612 https://github.com/dolphin-emu/dolphin/pull/1747

simias commented 9 years ago

I hadn't considered geometry shaders for that but it's quite clever. One situation I wasn't sure how to handle was non-standard aspect ratio (like wide screen hacks) where vertical and horizontal lines would have to be drawn with different thickness depending on the angle. I guess that could be solved by converting them to quadrilaterals in the geometry shader.

A potential annoyance is that line primitives would have to be rendered with a different draw call (since as far as I know you can't render two types of primitives in a single draw call?) so every time we switch from one primitive type to an other we'll have to insert a new draw call. In a worst case scenario (a series or alternating triangles and lines) that could probably be quite bad.

For opaque primitives this could be solved by enabling the Z-buffer and then draw the primitives in whichever order we want. This way we could also render everything in the opposite order used by the PlayStation GPU since we receive the primitives from farthest to closest but in order to limit overdraw we want to render them the other way around (if a primitive is hidden by an other there's no need to bother rendering it). Of course semi-transparent primitives will have to be rendered afterwards in the right order to display properly anyway.

Regarding wireframe I agree that it's useful (and cool looking) and I already play with it using PolygonMode (not the geometry shader):

spyro-moon-bg-wf

The difficulty is to figure out when to clear the buffer, otherwise the wireframe of each successive frame is drawn on top of the previous one and it becomes messy real fast. Designing the right heuristic to figure out when the image should be erased is the tricky part. For the image of spyro's background above I just bound a key to clear the framebuffer manually when I wanted to get a fresh image...

ADormant commented 9 years ago

@simias For debugging GL KHR Debug extension is good https://github.com/citra-emu/citra/pull/1196 Dolphin's pulls related to aspect ratio and fullscreen: https://github.com/dolphin-emu/dolphin/pull/1231 https://github.com/dolphin-emu/dolphin/pull/2769 https://github.com/dolphin-emu/dolphin/pull/2765 https://github.com/dolphin-emu/dolphin/pull/2796 https://github.com/dolphin-emu/dolphin/pull/2791 https://github.com/dolphin-emu/dolphin/pull/506 https://github.com/dolphin-emu/dolphin/pull/726 https://github.com/dolphin-emu/dolphin/pull/1688 https://github.com/dolphin-emu/dolphin/pull/1764

Quadrilaterals are a good idea. Quad Rasterizer in gpubladesoft fixed the majority of warping.

megaman legends quad need for speed quad spyro quad spyro quad 2 test drive quad threads of fate quad tomba 2 quad rendering twisted metal 2 quad twisted metal 3 quad

Yamakaky commented 9 years ago

Not sure, but I think that with glium we don't need GL_KHR_debug. @tomaka

ADormant commented 9 years ago

@simias https://forum.beyond3d.com/threads/psx-vs-n64-graphical-look.56516/page-4

https://www.opengl.org/registry/specs/ARB/point_sprite.txt https://www.opengl.org/registry/specs/ARB/fragment_coord_conventions.txt https://www.opengl.org/registry/specs/ARB/clip_control.txt https://www.opengl.org/registry/specs/ARB/blend_func_extended.txt

ADormant commented 9 years ago

SPU Reverb emulation https://github.com/hrydgard/ppsspp/pull/8116 @simias https://github.com/ogamespec/psxdev

ADormant commented 9 years ago

Multi-primitive graphics rendering in one draw call seems doable but you will probably need newer OpenGL than 3.3 @simias

http://www.songho.ca/opengl/gl_vbo.html http://www.songho.ca/opengl/gl_vertexarray.html http://www.openglsuperbible.com/2013/12/09/vertex-array-performance/ http://in2gpu.com/2014/09/24/render-to-texture-in-opengl/ http://www.cs.kent.edu/~zhao/gpu/lectures/OpenGL_FrameBuffer_Object.pdf http://www.songho.ca/opengl/gl_fbo.html https://www.opengl.org/registry/specs/ARB/pixel_buffer_object.txt https://www.opengl.org/registry/specs/ARB/texture_rg.txt https://www.opengl.org/registry/specs/ARB/wgl_render_texture.txt http://www.google.com/patents/US20140098117 http://www.informit.com/articles/article.aspx?p=2033340&seqNum=4 http://stackoverflow.com/questions/27946183/draw-multiple-shapes-in-one-vbo https://www.opengl.org/wiki/Vertex_Rendering https://www.opengl.org/wiki/Vertex_Specification https://www.opengl.org/wiki/Primitive https://www.opengl.org/registry/specs/NV/bindless_multi_draw_indirect.txt https://www.opengl.org/registry/specs/NV/vertex_buffer_unified_memory.txt https://www.opengl.org/registry/specs/ARB/indirect_parameters.txt' https://www.opengl.org/registry/specs/ARB/multi_draw_indirect.txt https://www.opengl.org/registry/specs/ARB/draw_indirect.txt https://www.opengl.org/registry/specs/ARB/multi_bind.txt http://www.g-truc.net/post-0642.html https://www.opengl.org/registry/specs/ARB/bindless_texture.txt https://www.opengl.org/registry/specs/NV/bindless_texture.txt https://www.opengl.org/registry/specs/ARB/sparse_texture.txt https://www.opengl.org/registry/specs/ARB/sample_shading.txt https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_fragment_shader_interlock.txt https://www.opengl.org/registry/specs/ARB/stencil_texturing.txt https://www.opengl.org/registry/specs/ARB/uniform_buffer_object.txt https://www.opengl.org/registry/specs/ARB/gpu_shader5.txt https://www.opengl.org/registry/specs/ARB/gpu_shader_fp64.txt https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_EXT_raster_multisample.txt https://www.opengl.org/registry/specs/NV/conservative_raster.txt https://www.khronos.org/registry/gles/extensions/EXT/EXT_primitive_bounding_box.txt This extension sounds good for quad rendering: https://www.opengl.org/registry/specs/NV/fill_rectangle.txt https://www.opengl.org/registry/specs/ARB/occlusion_query2.txt https://www.opengl.org/registry/specs/EXT/polygon_offset.txt https://www.opengl.org/registry/specs/EXT/polygon_offset_clamp.txt https://www.opengl.org/registry/specs/ARB/texture_non_power_of_two.txt https://www.opengl.org/registry/specs/ARB/seamless_cubemap_per_texture.txt https://www.opengl.org/registry/specs/ARB/texture_cube_map_array.txt https://www.opengl.org/registry/specs/ARB/texture_gather.txt https://www.opengl.org/registry/specs/ARB/arrays_of_arrays.txt https://www.opengl.org/registry/specs/ARB/instanced_arrays.txt https://www.opengl.org/registry/specs/ARB/texture_query_lod.txt https://www.opengl.org/registry/specs/EXT/texture_lod_bias.txt https://www.khronos.org/registry/gles/extensions/EXT/EXT_shader_texture_lod.txt https://www.opengl.org/registry/specs/ARB/texture_query_levels.txt https://www.opengl.org/registry/specs/ARB/invalidate_subdata.txt https://www.opengl.org/registry/specs/ARB/copy_image.txt https://www.opengl.org/registry/specs/ARB/clear_texture.txt https://www.opengl.org/registry/specs/ARB/texture_view.txt https://www.opengl.org/registry/specs/ARB/texture_storage.txt https://code.google.com/p/glextensions/wiki/GL_EXT_timer_query https://www.opengl.org/registry/specs/ARB/debug_output.txt https://www.opengl.org/registry/specs/ARB/shader_precision.txt https://www.opengl.org/registry/specs/ARB/shading_language_420pack.txt https://www.opengl.org/registry/specs/ARB/texture_rectangle.txt https://www.opengl.org/registry/specs/ARB/texture_non_power_of_two.txt https://www.opengl.org/registry/specs/ARB/shader_image_size.txt https://www.khronos.org/registry/gles/extensions/EXT/EXT_shader_io_blocks.txt https://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt https://www.opengl.org/registry/specs/EXT/shader_image_load_formatted.txt https://www.opengl.org/registry/specs/ARB/shader_storage_buffer_object.txt https://www.opengl.org/registry/specs/ARB/enhanced_layouts.txt https://www.opengl.org/registry/specs/ARB/transform_feedback3.txt https://www.opengl.org/registry/specs/ARB/shader_texture_image_samples.txt https://www.opengl.org/registry/specs/ARB/draw_buffers_blend.txt https://www.opengl.org/registry/specs/ARB/blend_func_extended.txt https://www.opengl.org/registry/specs/KHR/blend_equation_advanced.txt https://www.khronos.org/registry/gles/extensions/EXT/EXT_discard_framebuffer.txt https://www.opengl.org/registry/specs/ARB/shader_subroutine.txt https://www.opengl.org/registry/specs/ARB/shader_draw_parameters.txt https://www.khronos.org/registry/gles/extensions/KHR/texture_compression_astc_hdr.txt https://www.opengl.org/registry/specs/EXT/shader_integer_mix.txt https://www.khronos.org/registry/gles/extensions/EXT/EXT_shader_pixel_local_storage.txt https://www.opengl.org/registry/specs/ARB/compressed_texture_pixel_storage.txt https://www.opengl.org/registry/specs/ARB/framebuffer_no_attachments.txt https://www.opengl.org/registry/specs/ARB/explicit_uniform_location.txt https://www.opengl.org/registry/specs/ARB/texture_storage.txt https://www.opengl.org/registry/specs/ARB/texture_storage_multisample.txt https://www.khronos.org/registry/gles/extensions/EXT/EXT_multisampled_render_to_texture.txt https://www.opengl.org/registry/specs/ARB/sampler_objects.txt https://www.opengl.org/registry/specs/ARB/texture_buffer_object.txt https://www.opengl.org/registry/specs/ARB/texture_buffer_object_rgb32.txt https://www.opengl.org/registry/specs/INTEL/fragment_shader_ordering.txt https://www.opengl.org/registry/specs/ARB/shader_atomic_counters.txt

simias commented 9 years ago

Heh, you don't have to link the entire OpenGL spec! Also you don't have to mention me every time, I receive a notification every time something is posted anyway.

I think an alternative to quads could be perspective correct rendering with triangles. If we can retrieve the Z coordinate from the GTE then I think we can just map the scene properly without too much texture warping. Maybe quad rendering could be a simpler hack though. Well, first we need to have triangle texturing...

i30817 commented 9 years ago

How would the GPU memory frame and texture uploads interact with a hipothetical widescreen hack support? I know it breaks games in general (without other per game hacks for menus etc), but if you can increase the view camera width, increase the size of the memory 'region' that holds the frames and hopefully convert coordinates of the texture memory zone to the new enlarged frame zone it might work.

Or it might not be like that. I don't know how mednafen is doing it (i heard it now has widescreen support like PCSX2).

edit: nvm i didn't see the discussion above. I guess a generic method works fine for primitive renderers without culling that just use the camera, but when the typical 2d optimized screen rendering that occurs on the psx expects the image to end they mostly stop drawing. It would need per game hacks to enlarge the game area i guess.

ADormant commented 9 years ago

With regard to quad rasterization and perspective-correction the best option would be to have both at the same time like gpubladesoft is supposed to have because warping caused by affine texture mapping occurs mostly during movement as shown here https://www.youtube.com/watch?v=inFqJvEGGYc however triangle warping in PSX games is permanent as shown on the clips above.

simias commented 9 years ago

Here's how the widescreen hack is implemented in beetle psx (a fork of mednafen PSX):

https://github.com/libretro/beetle-psx-libretro/blob/master/mednafen/psx/gte.cpp#L1035

It's a simple modification to the aspect ratio of the screen projection in the GTE.

Here's what it looks like in rustation today, first without the hack (Spyro french disc):

spyro-pal

And with the GTE widescreen hack:

spyro-pal-widescreen

For this game it doesn't seem to render quite well, there doesn't appear to be too much missing geometry on the sides of the screen, in the regions that wouldn't normally be displayed. Crash Bandicoot doesn't fare so well:

crash-widescreen

Anyway, if you look at the two Spyro screenshots you can see that the actual resolution of the image in VRAM doesn't change, you have to rescale the image to get the desired aspect ratio in the backend.

You can also see that the size of the big gray rectangle doesn't change between the two images, that's because it's the 2D "Spyro" logo and it probably doesn't go through the GTE at all, it's drawn on top of the 3D image. It's similar to the backgrounds in RE, I assume. If you display the resulting image in widescreen the logo will look stretched.

Would it be possible to increase the image resolution in VRAM instead? Well that would be tricky. Remember that Rustation doesn't handle textures and the framebuffer is really supposed to look like this:

spyro-pal

With the right side holding all the texture data. Obviously if you widen the rendered image you will overwrite the textures which is not... ideal. With some very HLE tricks it might be possible to get it to work but it sounds tricky. It's probably not worth the hassle.

A better way might be to identify draw commands that didn't go through the GTE and scale those in the renderer instead.

simias commented 9 years ago

I agree that quad rendering and perspective correct mapping should probably both be implemented, quad rendering is probably much more straightforward (no need to mess with the GTE).

ADormant commented 9 years ago

Interlaced mode emulation. http://www.psxdev.net/forum/viewtopic.php?f=51&t=520

http://www.psxdev.net/forum/viewtopic.php?f=62&t=472&start=20

http://www.psxdev.net/forum/viewtopic.php?f=41&t=563

http://www.psxdev.net/forum/viewtopic.php?f=51&t=561

http://www.razyboard.com/system/morethread-horizontalvertical-display-range-of-psx-games-pete_bernert-41709-1247566-0.html

http://www.razyboard.com/system/morethread-same-words-about-psx-gpu-emulation-pete_bernert-41709-246724-0.html

http://www.razyboard.com/system/morethread-television-overscan-emulation-pete_bernert-41709-1569155-0.html

http://www.psxdev.net/forum/viewtopic.php?f=51&t=453

http://www.psxdev.net/forum/viewtopic.php?f=51&t=455

https://developer.nvidia.com/content/transparency-or-translucency-rendering

http://wiki.redump.org/index.php?title=PlayStation_1:_LibCrypt_protection

ADormant commented 9 years ago

PSX doesn't render quads but triangle strips with 4 vertices http://www.psxdev.net/forum/viewtopic.php?f=51&t=627 https://github.com/hrydgard/ppsspp/pull/8129/files https://www.opengl.org/registry/specs/ARB/sampler_objects.txt