simias commented 9 years ago

Overview of the PlayStation GPU

GPU Rasterizer

The GPU uses 1 megabyte of video RAM organized as a framebuffer of 512 lines of 2048 bytes. The CPU can upload textures to this buffer using GPU commands (it's not directly memory mapped in the CPU address space), it can also read back portions of the framebuffer using other commands.

The GPU also contains a relatively simple 2D rasterizer capable of drawing lines and triangles (and "quads" which are really just two triangles side-by-side). It supports solid colors, textures (truecolor and paletted), several transparency modes and gouraud shading. It can also apply a dithering pattern before outputing the 16bit color. The GPU has a small texture cache to speed up rendering of textured primitives.

The rasterizer always outputs 16bits per pixel (1555 RGB, where the MSB is the "mask" bit) so as far as it's concerned the VRAM is a framebuffer of 1024x512 pixels. Therefore all the draw commands use this system of coordinates.

Note that the GPU is fully 2D, it's not capable of 3D projection and therefore has no notion of depth (so no depth buffer or anything like that). The PlayStation does 3D projection on the CPU using the Geometry Transform Engine (GTE) coprocessor. That means for instance that the GPU cannot do perspective-correct texture mapping which is the source of some easily recognizeable PlayStation graphical artifacts.

The coordinate system used by the GPU is simply the 16bit per pixel coordinate in the video RAM, so (0, 0) is the top-left of the framebuffer while (1023, 511) is the bottom right. You can see a list of the GPU draw commands in the No$ specs.

GPU video output

Once a scene has been rendered/uploaded to the framebuffer it needs to be displayed on the TV through the NTSC/PAL analog video output. In order to do this the GPU video output can be configured to select a rectangle in the framebuffer and stream it to the TV.

The size of this window depends on the video timings used. For NTSC it ranges from roughly 256x240 to 640x480 while for PAL it's from 256x288 to 640x576. I say roughly because since it's an analog output you can tweak the timings in many ways so you can actually "overscan" the output to increase the resolution or crop it furthermore depending on what you're trying to do.

Interestingly even though the rasterizer only outputs 16bits per pixel the video output can be configured to use 24bits per pixel. That's of course mostly useless for graphics generated by the rasterizer but it can be used to display pre-rendered 24bit images, for instance videos decoded with the console's MDEC and uploaded on the GPU VRAM. An other application could be 24bit images dumped directly from the disc and used as static load screens.

Design of the emulated OpenGL renderer

Features

First and foremost I think accuracy should be the main focus. Of course if it was the only objective a software renderer would be better suited but I think with modern OpenGL and its programable pipeline it should be possible to reach a decent level of accuracy (except maybe for the GPU cache, see below).

OpenGL would also make it easier to implement certain enhancements to the game's graphics compared to the original console, for instance increased internal resolution, texture replacement, normal maps etc...

Later on we could even attempt to salvage the raw 3D coordinates from the GTE and use them to render the 3D scene directly with OpenGL. That would allow us to have higher precision for our vertex coordinates, perspective correct mapping and many other things only possible with a fully 3D scene.

I think it's important to keep those features in mind when designed the basic renderer architecture so that we don't end up breaking everything when we try to implement one of them.

Potential difficulties

As you can see from the previous sections the PlayStation GPU is extremely simple compared to a modern graphic card, however it features some quirks and "exotic" modes which don't fit the OpenGL pipeline very well as far as I can tell.

Textures and palettes

At first I thought the most obvious approach to emulate the GPU video memory would be to use a single 1024x512 texture/FBO (or bigger if we increase the internal resolution) and use it as our Video RAM. There are a few potential issues with that approach however.

Upscaling and filtering

When a game wants to upload textures and palettes to the video RAM it must use one of the "Copy Rectangle" commands saying where the data should end up in the framebuffer (always using the same 1024x512 coordinate system) and then send the data 32bits at a time.

At this point there's no easy way to know what the data contains, it can be a 24bit RGB image from a video, it could be a 16bit "truecolor" texture, it can be a paletted texture, it can be a palette or several of those things at once. We'll only know how to interpret the data when the GPU actually uses it, either through a textured draw command or by the video output configuration if it's just meant to be dumped on the TV screen without further processing.

This seriously limits what we can do with the raw framebuffer data if we don't want to break anything.

For instance if we have a single big FBO representing the entire framebuffer at an increased resolution (say, 2048x1024). When the CPU attempts to upload data to the GPU we could upscale and optionally filter it and store it in our big buffer. Easy.

Except of course upscaling and filtering palettes and paletted textures won't work as intended, the intermediate pixel values will be meaningless since we basically have a non-linear color space. We cannot risk destroying palettes, therefore we can't really mess with the uploaded data until we know what it's going to be used for. Or at least whatever we do must be reversible if we need to go back to the original value later on.

I'm not sure what's the best way to deal with this. Maybe we could have two framebuffers instead: one at the native 1024x512 resolution containing the raw framebuffer data with no fancy enhancements that would be used for paletted textures and a bigger framebuffer containing the rasterizer's output at increased resolution. Keeping the two coherent will be a challenge however and I don't know where 24bit images fit in there. Maybe we could use a completely different rendering mode when the video output is set to 24bit mode so that we can ignore it the rest of the time.

If we want to implement texture replacement we also need to figure out when it should take place. That's a complex subject however, maybe we can leave it for later.

OpenGL texture sampling

An other potential issue if we use a single texture/FBO for the entire video RAM is that we need to be able to render into it while we sample a texture in an other location in the same buffer. So we would be rendering to an FBO while it's also bound as a texture.

As far as I know this kind of configuration is not well supported by OpenGL and can quickly lead us into undefined behavirour territory.

I believe that this should be achievable using the GL_ARB_texture_barrier extension which is part of OpenGL 4.5 but maybe we can work around it.

Otherwise we could maybe use two framebuffers and "Ping pong" between the two between each frame instead, this way we would write to the current FBO while we use the previous one for input. That could be innacurate if a game decide to use a polygon rendered during the current frame to texture a subsequent one, I know some games use similar features to create some fancy visual effects.

Semi-transparency

The PlayStation GPU rasterizer has several pixel blending modes used for semi-transparent primitives (copied from the No$ specs):

  B=Back  (the old pixel read from the image in the frame buffer)
  F=Front (the new halftransparent pixel)
  * 0.5 x B + 0.5 x F    ;aka B/2+F/2
  * 1.0 x B + 1.0 x F    ;aka B+F
  * 1.0 x B - 1.0 x F    ;aka B-F
  * 1.0 x B +0.25 x F    ;aka B+F/4

Unfortunately I don't think the OpenGL blending fixed function is flexible enough to accomadate all these modes without a significant number of hacks. Besides for accuracy's sake we might want to handle the blending calculations in our own shader code to make sure we don't have weird rounding and saturation discrepancies (if we want to be bit accurate with the real hardware).

For this reason I think it would be better to handle the blending in the fragment shader. Once again this is not generally how things are done in OpenGL as far as I know, but it should be possible at least using the same OpenGL 4.5 GL_ARB_texture_barrier extension mentioned before or by "ping ponging" the buffers.

Masking

The MSB of the 16bit pixel is used to store a "mask" field. When the GPU renders a primitive it can be configured to set this bit to either zero or one. An other configuration flag can tell the GPU to treat framebuffer pixels with the mask bit set as "read only" and refuse to overwrite them. It effectively works like a simplified stencil test in OpenGL.

The problem if we decide to use a stencil buffer to emulate this masking feature is that we'd potentially need to update the stencil buffer after each primitive, since it's possible for any primitive draw to set the mask bit if the GPU is configured to do so. I don't know if it's possible to meaningfully modify the stencil buffer in an OpenGL fragment shader. Barring that we won't be able to use the stencil test to accurately emulate the masking.

Alternatively we could use the same trick I proposed to handle the semitransparency modes above: we fetch the target FBO pixel in the fragment shader and if the masking is enabled and its MSB is set we don't change its value. If we already handle transparency that way it might be relatively straightforward to add, theoretically.

Video output

So far I only talked about rending things inside the framebuffer, we also have to implement the video output to display the relevant part of the framebuffer on the screen.

The PlayStation video output streams the pixels from the framebuffer directly on the screen without any kind of intermediate buffer. In other words the display is "continuous", if a game wants to implement double buffering it does it by rendering to two different parts of the framebuffer and swapping the video output source position and GPU rasterizer draw offset when it wants to "flip" the buffers.

In order to emulate this properly we'd have to basically render one pixel at a time, keeping track of the output video pixel clock. I don't think that can be done very efficiently in OpenGL, nor does it sound necessary to emulate correctly the vast majority of games.

Instead we could display the entire visible portion of the framebuffer at once at the end of every frame or the beginning of the next one maybe. If the game uses double buffering we want to it right after it swaps its buffers to reduce latency. I'm guessing the swapping would take generally take place during the vertical blanking to reduce tearing artifacts, so maybe we could do the frame rendering at the end of the vertical blanking. I need to do more tests to make sure that's really how it works in practice though.

24bit mode

As explained before while the PlayStation rasterizer can only output 16bit RGB to the framebuffer the video output is capable of treating it as 24bits per pixel to display pre-rendered graphics.

When displaying in 24bit mode we'll have to find a way to take our 16bit RGB framebuffer and display it as a 24bit image. I don't know how difficult that is with OpenGL. I guess at worse we could sample two 16bit pixels in the fragment shader and reconstitute the correct 24bit value there. We could also implement 24bit image upscaling and filtering there to avoid having to handle it in the 16bit code. After all 24bit mode is a very limited special case on the original hardware.

Texture cache

The PlayStation GPU has a small texture cache used to speed up the rendering of textured primitives. Normally it only affects the speed at which a primitive is rendered, however if the cache becomes "dirty" (for instance by overwriting a texture while some of it is in the cache) it could potentially change the aspect of the resulting triangle.

I have no idea how to emulate this particular feature in OpenGL. As far as I can tell the only way to emulate it accurately would be to draw each primitive pixel-by-pixel, updating the cache state in the process but that goes against the spirit of the massively parallel GPUs we have today.

Fortunately I believe that for the vast majority of the games we can completely ignore this cache, so maybe we can ignore it or at least put it very low in our priority list.

simias commented 9 years ago

Do we really want to emulate real interlacing? Interlaced video is always a pain...

As Pete says in one of your links:

the Peops soft gpu plugin emulates interlaced gfx in a very simply way: exactly as non-interlaced gfx (well, in interlaced mode the screen will get updated on every emulated vsync, but that's all). There is no need to do some semi-clever frame mixing, etc... the PC can display a native height of 512 (or 480) pixels without problems, eh?

That's basically what I had in mind, I don't really think it's worth going beyond that unless a game relies on interlaced video for some weird visual effect. Other than that interlaced video is just a pain to deal with in general, you'll probably have to deinterlace it later on anyway unless you're outputting it to a real TV screen.

Although as far as the actual rendering is concerned this bit is interesting:

I wrote some test code, ran it on my SCPH 7002 unit, and had a look at the resulting VRAM dumps. Looks like the PS1 uses the odd/even bit in the status to decide which rows to render to, and that bit is only valid during the active area of the display (i.e. outside VSYNC area) as I learned some week ago while messing with the root counters.

I assume that's when you disable the "render to display" in the GPU config. It sounds annoying to emulate (and probably not really necessary?) but it's worth keeping in mind.

PSX doesn't render quads but triangle strips with 4 vertices

Yes that's how it's currently implemented although I don't use OpenGL strips to render them so I duplicate the shared vertex:

https://github.com/simias/rustation/blob/master/src/gpu/opengl/mod.rs#L174-L177

This diagram is nice, I wonder where it comes from:

psx-fb

ADormant commented 9 years ago

Framebuffer effects http://www.razyboard.com/system/morethread-framebuffer-effects-in-opengl-es-pete_bernert-41709-6144722-0.html http://greenimp.epsxe.com/GreenImp-ePSXeDoc.html http://ngemu.com/threads/petes-opengl2-psx-gpu-v2-2-released.42118/ http://ngemu.com/threads/truth-about-texture-windows.24327/ Using Z-buffer for MaskBit emulation http://www.razyboard.com/system/morethread-gpu-implementation-hints-pete_bernert-41709-852833-0.html http://ngemu.com/threads/a-crazy-idea-for-psx-video-plugins.26213/ http://ngemu.com/threads/perspective-correction-in-gpu-plugins.24724/ http://ngemu.com/threads/opengl-semi-transparent-graphics-problem.175265/ http://www.razyboard.com/system/morethread-unfiltered-framebuffer-updates-problem-pete_bernert-41709-5521845-0.html http://www.razyboard.com/system/morethread-opengl-plugins-bugs-improper-effects-in-some-games-pete_bernert-41709-4084063-0.html

Regarding Interlaced mode the best option may be adding an option to disable and enable it like with dithering?

simias commented 9 years ago

Mmh, you just made me realize that when the mask bit is not forced to 1 for draw commands then its value is taken from the texture (if the poly is textured, obviously). So we can not know its new value until we're in the fragment shader.

This is annoying because I'm not sure it's possible to set the stencil value in the fragment shader. In your link I see that arekkusu says he uses the stencil + the alpha but that sounds lame. And Pete abuses the Z-buffer for that but I really don't think it's worth bothering with that (we have greater plans for the Z-buffer anyway).

So maybe the stencil is not the right way to go for that, maybe we could only use the alpha and discard the fragments in the shader depending on the mode. That might not be great for performance but at least it should be relatively straightforward.

ADormant commented 9 years ago

https://www.opengl.org/wiki/Early_Fragment_Test https://www.opengl.org/wiki/Per-Sample_Processing https://www.opengl.org/wiki/Rendering_Pipeline_Overview

Fragment shaders are not able to set the stencil data for a fragment, but they do have control over the color and depth values.

simias commented 9 years ago

Yeah I think it might be simpler to store the framebuffer as GL_RGB5_A1 and use the alpha bit as a mask bit. Or something like that. We want to use non-normalized integer operations as much as possible for accuracy, I'm not sure if GL_RGB5_A1 can be used non-normalized.

Alternatively we could use something like GL_R16UI to have a single "raw" 16bit per pixel and we would do all the color handling in the shaders. That's pretty much what we'll do anyway. Might require more special-casing if we want to support increased color depth though.

ADormant commented 9 years ago

For higher color depth I think the option would be to implement some shader that automatically decodes/converts paletted textures into true color/deep color textures. I think Dolphin and PPPSSPP are doing it that way perhaps GSDX too. Texture Buffer Object can be used for this. https://www.opengl.org/wiki/Buffer_Texture

https://github.com/hrydgard/ppsspp/blob/master/GPU/Common/TextureScalerCommon.cpp

https://github.com/dolphin-emu/dolphin/pull/2085/files https://github.com/dolphin-emu/dolphin/pull/2059/files

simias commented 9 years ago

That would require a texture cache though, right?

As a first approach I thought the filtering/depth conversion could be done on the fly in the fragment shader for each render (a wasteful but for simple nearest/bilinear probably not too bad) and then an actual texture cache would come later and allow things like texture replacement and fancier shaders.

ADormant commented 9 years ago

Probably yes but do you mean PS1 specific texture cache or just a general OpenGL texture cache? General texture cache should greatly improve performance as well. Seems like PSX can use triple-buffering. http://www.razyboard.com/system/morethread-native-resolution-pete_bernert-41709-4996857-0.htm

Mmm... I will try an easy explanation: you can imagine the whole PSX framebuffer RAM as a 1024x512 rectangle area (with 15 bit color depth).

Now in this area the PSX can define smaller rectangles, which will be used as backbuffer, frontbuffer, or even for triple-buffering (and everything which is not used as such a "display area" will be filled with texture data and color table data).

The "front buffer" rectangle is the one you will see on your TV screen, while the gpu is rendering in the "back buffer" (or triple) area.

This screen area rectangle typically has a width of 256, 320, 368, 384, 512 or 640 pixel. The height can be anything up to 512 pixel.

simias commented 9 years ago

Multiple buffering shouldn't change much in this case since I want to manipulate the entire VRAM as a single texture. Things like offscreen rendering should Just Work unless I'm missing something.

A good texture cache would improve the performance without a doubt but it's pretty tricky to get right. That's why I'd rather start without a cache, just sampling the VRAM buffer for textures and palettes.

simias commented 9 years ago

I've added line drawing support in https://github.com/simias/rustation/commit/f6514ed5ad1459e7170c1d019dccff09f76ad986

I added a queue for draw commands, now we can queue triangles and lines (and possibly other things later) and it will use as many draw commands as necessary to render all the primitives in the scene. Of course if a game interleaves many triangle and line drawing commands we'll end up using many draw commands to render the entire scene which will hurt perfs.

When the internal resolution is increased I change the OpenGL LineWidth so that the lines don't appear smaller than they ought to be. Of course that only works if the horizontal and vertical upscaling factors are the same (otherwise the line width depends on the angle). I can't see much use for that anyway besides widescreen hacks.

Now the rain is rendered in the Medievil start screen (using shaded lines):

medievil-rain-upscale

Here's the BIOS using lines to draw its UI:

psx-bios-upscale

ADormant commented 9 years ago

I've added line drawing

Was it done with geometry shaders?

simias commented 9 years ago

No, just using the builtin OpenGL line primitive: https://github.com/simias/rustation/blob/master/src/gpu/opengl/mod.rs#L190-L193

Geometry shaders might be useful if we want to emulate the PlayStation line drawing algorithm exactly but I don't know if it's really significant. For horizontal and vertical lines used in menus it shouldn't change anything at least.

For lines with other angles there might be a difference unless OpenGL happens to use the exact same line drawing algorithm as the PSX GPU (unlikely). It remains to be seen whether it causes problems in practice. I doubt people will be able to tell the difference in Medievil's rain!

simias commented 8 years ago

I've started implementing the two pass renderer where the commands are drawn to a framebuffer texture before they're displayed to the screen. It's in the gpu_rewrite branch.

Here's how it looks like in Medievil:

medievil

At the bottom right I overlay the entire framebuffer texture for debugging purposes.

By default I use a 16bit ARGB1555 texture since that's what the real console uses, that's why you can see this extreme banding in the gradients (I don't implement dithering yet).

Here's the same scene with 2x internal resolution, 32bit color depth and widescreen hack:

medievil-2x-ws

simias commented 8 years ago

I think I've figured out how to handle semi-transparency correctly for textured polys, unfortunately it means rendering those polygons in two passes.

The tricky part is that semi-transparent polygons can contain fully opaque, semi-transparent and fully transparent pixels. I think the solution is to render those polygons twice, once only with the opaque pixels and then a second time with the blending equation set properly and rendering only the semi-transparent pixels.

I don't know how bad it will be performance-wise but since this is only for semi-transparent polys I'm hoping it won't be too bad.

The alternative is to handle all the blending in the fragment shader but that means relying on OpenGL extensions and probably reducing portability.

simias commented 8 years ago

Texture uploading is implemented and seems to work fine in the few games I'm able to boot up.

Here's Spyro:

spyro-pal-vram

We can see that it's pretty close to the VRAM dump I posted above:

spyro-pal

Here's Crash Bandicoot, it uploads the loading screen directly in the displayed framebuffer so it already works as expected:

crash-textures

simias commented 8 years ago

I've implemented a very basic texture mapping shader in the textures branch. It's quite ugly but at least it shows that doing all the work in the fragment shader seems to work. There's no support for semi-transparency or texture blending.

crash

spyro-start

einhander

Those screenshots were made with 2x internal resolution.

simias commented 8 years ago

Fixed texture blending. Now it's starting to look decent.

crash-blending

spyro-texblend

simias commented 8 years ago

@ADormant I tried the quad rasterization thing with mixed results.

Basically the thing works well when the quad represents something that's supposed to be rectangular in the game (since we can basically guess the perspective correction in this situation) but if the quad draws something that's supposed to be a random quadrilateral then the algorithm ends up stretching the texture weirdly.

Here's an example:

quad-mapping

If you look at the book stands the texture does look better in the corrected version, however it also appears slightly stretched. The pixels at the top loop bigger than the pixels at the bottom. That's because the stand is not rectangular, it's wider at the top than at the base.

And the less rectangular the quad the more obvious it becomes. Here's an almost triangular quad at the top of a tomb in medievil:

quad-correct4

It's also visible in the snowy mountain tops in Spyro:

spyro-quad-corrected

So depending on the situation it might look better or worse but in the end the right way to do it is probably to use the actual Z coordinate used by the GTE, this way it'll work for all shapes including triangles without those aberrations.

ADormant commented 8 years ago

Does that stretching occur with gpubladesoft's quad rendering too? Either way it would still be good to have this option.

simias commented 8 years ago

No idea, I've never tried it. Maybe it uses a different algorithm.

I agree that it could still be an option, it's not a whole lot of code anyway.

ADormant commented 8 years ago

Perhaps a heurestic can be implemented which detects shape of objects to prevent stretching? After you finish quad rendering I wonder if you could copy GTE accuracy implementation from here https://pcsxr.codeplex.com/discussions/264234 it is reportedly better than the version implemeted in PCSXR and has some sort of depth data(partial depth buffer?). and CPU overclocking from here https://github.com/SonofUgly/PCSX-Reloaded/commit/3f11d29f31ca02575aeedf073e87ffee933effb0 https://pcsxr.codeplex.com/discussions/647809

simias commented 8 years ago

Well the problem is that quad mapping uses the shape of the object to guess the perspective correction to apply but there's no other cue to know if an object is a rectangle seen with a perspective or just something that's not rectangular.

Here's a page that describes the algorithm I'm using: http://www.reedbeta.com/blog/2012/05/26/quadrilateral-interpolation-part-1/

Maybe I could reject quads that look too "un-rectangular" to avoid the extreme stretching seen in the Medievil screenshot above. I'll do more testing once I get more 3D games to work.

simias commented 8 years ago

I think I figured out how to handle both semi-transparency and masking in all cases, I'm making a note here before I forget. There are many corner cases to consider but I really hope this covers everything:

For opaque polygons

If "mask test" mode is enabled we can use the alpha blending to discard masked fragments (RGB blend equation: GL_ONE_MINUS_DST_ALPHA, GL_DST_ALPHA). Otherwise we output the texel alpha as usual using GL_ONE, GL_ZERO.
If "mask set" mode is enabled we can update the target alpha/mask bit by using GL_ONE, GL_ZERO as alpha blend equation and forcing the texel alpha to 1.0. Otherwise we use the regular texel alpha value.

For semi-transparent polygons

This is where the fun begins.

The tricky part here is that we need to do actual alpha blending for the semi-transparency to work properly but then it means that we can't abuse it for mask testing. Instead we can use the stencil to emulate the masking.

We need to render semi-transparent polygons in two passes: first the opaque pixels in the texture (where the alpha is not set) are rendered like opaque polygons. We can probably use the same shader and draw them along. The semi-transparent texels are ignored (i.e. discarded in the fragment shader). Non-textured semi-transparent polygons can skip this step since they're completely semi-transparent.
Multi pass rendering can be implemented using the Z-buffer and giving each polygon an arbitrary Z-value decreasing for every new primitive. This could be reused if we later implement a real Z buffer using the GTE Z-value.
Then we must draw the semi-transparent texels (opaque texels are discarded this time around). We can use the following blending equations to emulate all the modes:

Playstation semi-transparency mode	OpenGL RGB blending equation	OpenGL blending parameters	Constant alpha
dst / 2 + src / 2	ADD	CONSTANT_ALPHA, CONSTANT_ALPHA	0.5
dst + src	ADD	ONE, ONE	Don't care
dst - src	REVERSE_SUBTRACT	ONE, ONE	Don't care
dst + src / 4	ADD	CONSTANT_ALPHA, ONE	0.25

The constant alpha is set with glBlendColor.

If "mask test" mode is enabled this time we can't use the blending equation to mask the pixels since the function is used to emulate the semi-transparency. Instead before we render the semi-transparent texels we can create a stencil buffer set for all pixels in the framebuffer where the alpha bit is set.
Once this stencil is built we can enable the stencil test and render the semi-transparent with the stencil test enabled
If the "mask set" mode is enabled we can just tell OpenGL to update the stencil value for each pixel written to the framebuffer
If "mask set" mode is not enabled we have to be careful since the stencil value should only be set when drawing textured polygons. Since we only draw semi-transparent texels in this pass we know that the mask bit will always be 1 for textured polygons, however for monochrome/shaded polygons it will be 0. I think the only way to handle that is to change the glStencilOpSeparate to either update or keep the previous value in the stencil buffer depending on the type of polygon we're about to draw. We must also write the correct 1.0 or 0.0 value to the destination alpha to make sure that the alpha/mask bit remains coherent with the stencil for subsequent commands.

Overall that means that drawing semi-transparent polygons can turn out to be quite expensive since we potentially need two passes + a bunch of juggling with the stencil buffer and the semi-transparency modes (in the worst case they could change between each draw command).

Since semi-transparent primitives have to be drawn in-order we can't batch similar primitives together so in the worst case we might end up having to use a lot of draw commands to render the scene.

This might not be as bad as it sounds though:

Semi-transparent polygons are also take longer to draw on the real console so game devs had to be careful not to overuse them
The worst case is when drawing semi-transparent polygons with the "mask test" mode set and interleaving many monochrome and textured polygons. That sounds specific enough to be uncommon (although who knows...).
By enabling the Z-buffer and rendering all the opaque polygons first we might hopefully hide some of the semi-transparent polygons which will be drawn last.
We can also draw opaque primitives front-to-back and reduce overdraw dramatically for the entire scene which should speed the rendering quite significantly for most games (no need to render hidden pixels). The real console doesn't have a Z-buffer so it draws everything using the painter's algorithm which results in a tremendous amount of overdraw.
However we have to be careful to draw semi-transparent primitives in the right order since the end result is order-dependent.

ADormant commented 8 years ago

Algorithms http://www.scratchapixel.com/lessons/3d-basic-rendering/rasterization-practical-implementation/perspective-correct-interpolation-vertex-attributes http://www.cs.cornell.edu/courses/cs4620/2012fa/lectures/notes.pdf https://www.comp.nus.edu.sg/~lowkl/publications/lowk_persp_interp_techrep.pdf http://web.cs.ucdavis.edu/~amenta/s12/perspectiveCorrect.pdf http://www.lysator.liu.se/~mikaelk/doc/perspectivetexture/ http://www.inf.ufrgs.br/~oliveira/pubs_files/PG01_Adaptive_subdivision.pdf http://ngemu.com/threads/edgblas-gpubladesoft.144037/page-7 https://www.particleincell.com/2012/quad-interpolation/ http://www.iquilezles.org/www/articles/ibilinear/ibilinear.htm http://math.stackexchange.com/questions/13404/mapping-irregular-quadrilateral-to-a-rectangle http://stackoverflow.com/questions/26332165/projective-interpolation-of-textures-in-2d-trapeziums-with-opengl http://vcg.isti.cnr.it/publications/papers/quadrendering.pdf https://www.inf.ethz.ch/personal/dpanozzo/papers/Demystifying-2015.pdf http://dl.acm.org/citation.cfm?id=1058131 http://graphics.cs.williams.edu/papers/ClipJGT11/McGuire-Clipping.pdf http://www.mathworks.com/matlabcentral/answers/222379-how-to-create-patches-of-quadrilaterals-4-vertices-1-at-a-time-and-render-them-all-at-once http://stackoverflow.com/questions/7532867/pixel-shader-to-project-a-texture-to-an-arbitary-quadrilateral http://www.cs.cmu.edu/afs/cs/academic/class/15462-f10/www/lec_slides/a1-jensen.pdf http://help.autodesk.com/view/ACD/2015/ENU/?guid=GUID-253D0647-1CEF-4183-8776-9B48C7000304 http://pc2.iam.fmph.uniba.sk/amuc/_contributed/algo2005/vanecek-svitak-kolingerova-skala.pdf ftp://ftp.sgi.com/sgi/opengl/contrib/mjk/tips/projtex/distortion.txt https://en.wikipedia.org/wiki/Multivariate_interpolation http://stackoverflow.com/questions/26345156/perspective-correct-shader-rendering http://stackoverflow.com/questions/12414708/correct-glsl-affine-texture-mapping http://web.eecs.umich.edu/~sugih/courses/eecs487/lectures/24-TextureMapping.pdf

simias commented 8 years ago

Great! Thank you. I've been googling for alternative quad mapping algorithm without much success. I think bilinear interpolation will solve the stretching I had, however it won't give the right perspective effect for things like wall spans. I'll give it a try.

ADormant commented 8 years ago

simias commented 8 years ago

Dithering:

dithering

When increasing the internal res the dithering pattern keeps the same size (pixel-wise) so it appears to be getting smaller as the resolution increases:

dithering-2x

It would be easy to scale the dithering as well but that makes it even more obvious so I'm going to leave it that way for the moment.

simias commented 8 years ago

I've added 24bit display mode support, Spyro used it to display the "Universal" logo at the start. Until now it looked like this, with the 24bit pixels rendered as 16bits:

spyro-no24bpp

Now it renders correctly (I hope):

spyro-24bpp

Since in 24bit mode I have to rebuild the pixel values from the 16bit framebuffer texture there's no linear filtering implemented. It's not really an issue I think, filtering of the output should be handled by the frontend anyway.

ADormant commented 8 years ago

Is there an option to disable dithering? Moreover ditering should be automatically disabled at 24bit rendering mode.

simias commented 8 years ago

There will be one, currently I don't have any configuration system implemented but I'm trying to code in a way that won't make it hard to add it in later.

i30817 commented 8 years ago

For configuration systems, remember that people will want to change configuration per game. A two level config system that works like [GAME_SPECIFIC_OPTION] ? return [GAME_SPECIFIC_OPTION] : [GLOBAL_OPTION]

would likely be ideal (even if at first everything is global options because there's no way to change game specific ones). It makes sense that whatever data structure you use to represent the 'two levels' you can use either one to the same GUI (activated by different buttons), so you can share code of-course.

This is pretty important especially in key configuration and key macros. For some reason not many emulators manage this in this controller area very well. Dolphin for instance, has per game options but not per-game controller config options (yet), so it's harder than it seems to unify if you don't plan it.

simias commented 8 years ago

Can RA manage that? Sounds like it would best be handled by the frontend.

i30817 commented 8 years ago

Ehh, there are more design issues, like for example, when to save to disc from a config change and to which file, default options, which options are 'safe' and which ones require a game reset and when to change, and stuff like that. I doubt that there is a ready to use solution with a '2 levels' approach in a widget system.

I don't know rust so maybe there is though.

edit: also a per game option system (with corresponding per game config files) would be useful for 'hidden' hacks too if they're supposed to apply to more than one game but not all.

edit2: Oh, you mean retroarch with RA? Maybe so, maybe so. I don't really know how that works, but it would be weird if it didn't allow per game options... If you want to reuse the interface they use for that sure, but i doubt you don't end reimplementing all of it eventually if you want your own GUI.

simias commented 8 years ago

I plan on implementing the libretro API and then use RetroArch as the frontend. I have enough work with the core, I don't want to reinvent the frontend.

ADormant commented 8 years ago

Extensions https://github.com/hrydgard/ppsspp/pull/8270/files

ADormant commented 8 years ago

@simias https://www.opengl.org/registry/specs/ARB/texture_buffer_object_rgb32.txt https://github.com/Emu-Docs/Emu-Docs/tree/master/PlayStation

ADormant commented 8 years ago

@simias GlideN64 widescreen hack https://github.com/gonetz/GLideN64/commit/2525b86f49dde4eb6be4ba6d6a30f792f0b28a18 https://github.com/gonetz/GLideN64/commit/c9d486222b4baee22132243fa1ac478c31baa9d9

simias commented 8 years ago

I'm using GL_R16UI for the "raw" VRAM now, this way I can do everything with integers in the shader. The "out" buffer is either GL_RGB5_A1 (accurate 16bit mode with dithering) or GL_RGBA8 (enhanced 32bit mode without dithering).

I'm going to implement the GTE widescreen hack soon, it's pretty straightforward. It'll only work with fully 3D games though.

simias commented 8 years ago

I've prototyped the GTE accuracy/subpixel precision in mednafen/beetle: https://github.com/simias/beetle-psx-libretro/commits/subpixel_accuracy

subpixel

I based the code on the simple implemnetation from PCSX-R. Unfortunately it's a bit too simplistic to work well in all cases.

The main issue is that it works by using the associating the native (low precision) x/y coordinates with the extended precision coordinates in the GTE. Then when the GPU has to render a triangle it can lookup the cache to find a vertex with the same x/y coordinates and use the high-precision values instead.

Unfortunately if two different vertices happen to share the same native position we can't know which one matches the extended precision data. In the gif above you can observe that on the score at the bottom left, it warps when "subpixel precision" is enabled because it aligns with some of the ground vertices.

And it would be even worse if we used the z-coordinates for texture mapping because they could differ wildly for two vertices sharing the same on-screen position. Then the mapping could potentially be completely wrong.

I've tried implementing a more costly but hopefully more accurate solution but I don't think it will be fast enough to be usable in mendafen. I think I'm going to postpone that until Rustation is in a better shape.

I wonder if ePSXe has a more clever implementation of if it does it like PCSX-R.

ADormant commented 8 years ago

@simias Interesting post about gte accuracy http://ngemu.com/threads/peteopengl2tweak-tweaker-for-peteopengl2-plugin-w-gte-accuracy-hack.160319/page-40

simias commented 8 years ago

Yeah I've stumbled upon the same issue when implementing increased GTE precision in beetle/mednafen as I mentioned in my previous post. iCatButler has the right diagnostic, the lookup table is too hacky to work properly all the time.

A potentially better solution would be to store the increased resolution data alongside the regular PSX precision in a hidden cache and always keep them paired, from the GTE all the way towards the GPU.

Of course this would be more costly since you'd have to check if there's some GTE data for each RAM and DMA access.

I tried implementing that in mednafen and almost got it to work but then I noticed that many games don't send the GTE data straight to RAM but rather pass through a CPU register first. In this case we have to keep the increased precision data paired with the CPU registers and I was worried that it would slow things down too much.

Here's the patch I came up with: https://github.com/simias/beetle-psx-libretro/commit/2b5c6f623affe5fffa1ddcb6894dbaf25d91d43d

I guess I could try to finish the CPU part to see if it works in practice.

ADormant commented 8 years ago

I tried the quad rasterization thing with mixed results.

@simias could post a link to the code of your quad mapping attempt.

simias commented 8 years ago

It's in the quad_mapping branch: https://github.com/simias/rustation/commit/4b0b5f28b01c3cc56c8f5789b06fcd276153dbc5

But after our discussion I think I had the bad approach, I should have tried to implement bilinear mapping instead of guessing the perspective correction.

simias commented 8 years ago

I've tried so many that I'm not sure which one you're talking about :)

simias commented 8 years ago

Doesn't ring a bell and I can't find any mention of "heuristic" in my commits. Do you remember what it did? You're talking about beetle/mednafen right? Not rustation.

By upscaling you mean increasing the internal resolution, right? The only version that worked well is the one that's currently commited upstream in beetle/mednafen. It's got a heuristic for upscaling 2D elements correctly at 2x (without any seams).

ADormant commented 8 years ago

@simias How good is this GTE precision? https://github.com/tapcio/PeteOpenGL2Tweak/tree/PrecisionRAM https://github.com/tapcio/PeteOpenGL2Tweak/commit/e20e2d4618616735ce1848636d047bad7934f9e1

simias commented 8 years ago

That's interesting but I'm not sure I understand what it's doing exactly. It looks like it might be similar to what I'm trying to implement in the "subpixel" branch. Basically instead of forwarding subpixel data directly from the GTE to the GPU you pair it with the 32bit "native" data and pass it along to the RAM, DMA etc... Of course it adds a performance cost to all RAM access and adds an overhead to all CPU register operations so it's rather slow, but it should be less hacky and buggy than the naive hack that PCSX-R uses.

But that's in theory because I still can't get it to work at the moment even though I got all the pipeline ready, I'm currently debugging to figure out where the subpixel data gets lost.

At any rate if that's what this guy is doing I don't think it'll work without tweaking the CPU itself to be subpixel-data aware since many games seem to like to pass vertex position data through CPU registers.

ADormant commented 8 years ago

@simias @tapcio Well perhaps you two can exchange informations about this problem. https://github.com/tapcio/PeteOpenGL2Tweak/issues/4

simias commented 8 years ago

Sure, why not. I'll try to create a new issue with my current discoveries. But first I need to fix my debugger...

Nucleoprotein commented 8 years ago

Yes this tries to work in the way you described but it really hard to implement such thing in PSEmu Pro architecture. I don't know also why I get some incorrect data, but this maybe happening because of hackish way to get source address of data.

GPU can access CPU registers ? Because in PCSX-R and PEOPS OpenGL 1.78 I do not see a different way of getting vertex data by plugin than DMA transfer - all polygons are drawn at GPUwriteDataMem which happens in GPUdmaChain.

simias / rustation

Specification of the OpenGL renderer architecture #10

Overview of the PlayStation GPU

GPU Rasterizer

GPU video output

Design of the emulated OpenGL renderer

Features

Potential difficulties

Textures and palettes

Upscaling and filtering

OpenGL texture sampling

Semi-transparency

Masking

Video output

24bit mode

Texture cache

For opaque polygons

For semi-transparent polygons