skmp / refsw-offline

An offline renderer for HOLLY2 CORE scene dumps
4 stars 0 forks source link

Guesswork: How does the hardware work? #1

Open skmp opened 1 year ago

skmp commented 1 year ago

Resources & notes

Very interesting analysis by RTL Engineering on PS2 and CLX2 on youtube (link directly to dreamcast part)

Overal unknowns

ISP

We know it processes '32' pixels at once. It is a question of how that happens, interpolation wise. Does it do barycentric test x32? is it a deep interpolation pipeline with up to 32-pipes to fill?

Documentation and practice seem to support that the depth buffer is indeed 32-bit float (or at least, 31-bit, no sign). There seems to be 2 stencil bits to calculate the ModVols for shadows.

TSP

We know it processes lists of pixels RLE encoded from the ISP buffer. It does one pixel at a time, though again, it is a question on how it calculates the pixel parameters.

It is widely accepted that all parameters are perspective correct in their calculations (up to: 4 uv coords, 16 colors, though only 2/8 at the same time).

Best case throughput with bilinear filtering is 1 pixel per clock, probably deeply pipelined and buffered as well.

OIT

My opinion is that a version of layer peeling is used, both for PT (front to back) and TR (back to front) as there are several references to such a process in the documentation. The RTL Engineering video disagrees. Also, the calculations it has for the micro depth buffers would make things not fit.

TSP feedback for PT

After the pixels (a pixel?) are (is?) rendered, there is some backchannel to notify which failed the alpha tests. Those are then marked as 'to render again' for the next layer peel.

Likely, the same feedback mechanism is re-used to mark which triangles no longer have visible triangles for discard processing.

Notes from skmp

Interesting analysis, and the reliance on die imaging is intriguing. Some notes I took during the dreamcast part of the video:

((This comment has some errata, check replies bellow for some clarifications))

[references]

refsw: https://github.com/skmp/reicast-emulator/blob/alpha/libswirl/rend/soft/refrend_base.cpp, https://github.com/skmp/reicast-emulator/blob/alpha/libswirl/rend/soft/refsw.cpp, https://github.com/skmp/reicast-emulator/blob/alpha/libswirl/rend/soft/refsw_pixel.cpp

lxdream's ta: https://github.com/skmp/reicast-emulator/blob/alpha/libswirl/gpl/lxdream/tacore.cpp

A few erratums in the above notes

It seems like palettes can do direct argb8888 and YUV gets decoded to argb8888, however there seems to not be a direct argb8888 format for textures. Fun oddity I forgot :3.

The ISP also might have up to 512(!?) vertexes cached, used during PT (alpha-test/auto sort) and TR (alpha-blend/auto sort) modes. There are registers to partition this cache.

PT mode seems to render from front to back, while TR seems to render from back to front.

Double checked, there is clear reference in various places for a 'reference depth buffer'. Not sure how the span sorter might be involved there - though that'd be trippy.

There is something mentioned for 'discard processing' in some places, which seems to mean that fully processed polygons get removed from the Layer Peel. I wonder how that works, does it modify the display lists in memory during rendering to keep track of that ?

(will merge at a later time)

Clipping

As quads are supported nateivelly, clipping against one plane transforms a triangle in a quad. I need to look at worse case for 4 planes, though .. likely more complex.

An idea would be that CORE itself uses 0..31 coordinates for pixels, doing clipping before and rendering them as quads. Would certainly make the math easier.

skmp commented 1 year ago

Testable things

skmp commented 1 year ago

ISP_BACKGND_D indicates that the frame-buffer is truncating the last 4 bits of mantissa, leading to a (?) s1e8m19 format (?).

ISP_BACKGND_T indicates that the tags are 29 bits wide. For arrays, a separate tag address is used for each render.

skmp commented 1 year ago

ISP/TSP parameter cache seemingly exists in the hope of re-using triangle parameters across tiles. for ISP it is said to be 8KB, and in TSP (4KB) in the RTL video. Roughly that'd mean (5 4 prms + 8 4 cols + 1 4 Z + 4 4 UVs) = 72 bytes per entry ~ 56 triangles in TSP cache worst case, 102 best case (no 2 volume modvols).

That means expectations for TSP minimum ~ 10 to 20 pixels per triangle, which at 640x480 is ~ 30 to 15k triangles / frame, 1.8M ~ 0.9M per second @ 60 fps, which is close to peak performance of DOA2LE.

Also note, the cache doesn't get polluted for fit only in one bin (or that won't make use of it ? not sure if TA has smarts for direction).