tobspr / RenderPipeline

Physically Based Shading and Deferred Rendering for the Panda3D game engine
https://github.com/tobspr/RenderPipeline/wiki
Other
961 stars 132 forks source link

Discussion Thread: Questions about the process and Suggestions for Performance #46

Closed moonshineTheleocat closed 7 years ago

moonshineTheleocat commented 8 years ago

I have just a few questions about how the Render Pipeline does it's global Illumination.

The first would probably be some recommendations on performance improvement.

I noticed right away that you are using an incredible resolution 256^3, which amounts to 16777216 voxels which does amount to a lot of space being used. Though being sparse does make it compact. The amount of resolution for that however is still quite massive, traversing it isn't entirely trivial, and it's fixed as things grow further from the camera.

Why not use several cascades of much smaller textures?

Example... if we use about 128 bytes per voxel, assuming isotropic, and we have four grids of 32^3 voxels the memory usage drops to... 16.8 Megabytes. Or close to it.

When you voxelize, render at a slightly higher resolution, then sample for each of the cascades. if the model is too small to fill some volume of a voxel, it should be ignored.

A higher resolution would be focused on areas closer to the camera. And as we grow farther, the cascades grow larger in an exponential fashion (Focus on better quality up close. Less focus on such in the back).

tobspr commented 8 years ago

Cascades are actually a thing I wanted to implement, but just didn't get to it yet. You are totally right about the memory usage, having such a huge voxel grid takes a lot of memory and is also slow to sample - smaller voxel grids would be much faster (also due to how the cache works).

I do plan on removing the specular traced component of the vxgi plugin though - it suffers from blockiness during low grid resolution, and localized cubemaps (Combined with screen space reflections) provide much better results (and sharper reflections).

I also have some plans to provide some global illumination solution using spherical harmonic probes distributed over the level, which might probably replace the VXGI solution depending on how well it works out.

moonshineTheleocat commented 8 years ago

I thought about using the probe approach. My initial thought was to use indexes of low res cube maps for the probes. One will just be diffuse, the second would of the UV coordinates of the light map. Which would be generated by building a low res voxel grid, and then creating a convex hull from that. When you light the scene, you cook the lighting to the light maps at real time, and then have the cubemaps sample texels based on their dictionaries of UV coordinates, and relight their diffuse.

Theoretically, this will be significantly cheaper than voxels, and can run on older hardware with reasonable framerates.

This would be similar to what enlighten does... but I've found more reasons to not do that. Mostly because for larger worlds, you'll need a massive light map (Basically a mega texture). Other wise the UV coordinates won't correctly point to the desired texture. I could do the sparse technique... but the next paragraph tells you why that's a problem. And the other few problems... only lights and static objects have an influence on the color. This is a real world thing, and technically only high intensity lights (like the sun) actually causes significant color bounce. So this is less of a problem. Dynamic objects will have to sample from the probes after they have been gaussian blurred. Emission will probably not be supported without giving it a point light.

I thought about using the blue and alpha bits to hold onto IDs of cells, for a total of 65535 ids. But I don't know enough about shaders to do a look up by name if it's even possible. UNT's graphics class is based on OpenGL 1.3. And while it's easy to transfer to the newer versions of OpenGL and directx... there's not a lot of resources for shader authoring.

tobspr commented 8 years ago

Well, when using probes I would basically divide the level into regions and spawn a fixed amount of probes in each region.

To update the probes, I'd probably just update one probe per frame, usually time of day does not change that fast, so it should not be noticeable. When updating, I would render a cubemap from the position of the probe, which contains a capture of the level lighting - that is, sun, and other lights. (I already have that for environment probes). Then I'd filter that cubemap to generate 2nd order spherical order harmonics.

To apply the probes, taking the n closest probes and blending them is probably the sufficient. I thought about a buffer texture storing the probe data (the 2nd order spherical harmonics). This would make it possible to store a lot of probes, assuming 8 byte per probe, and a maximum storage of 24MB, it would be possible to store 3.145.728 probes - quite enough for the beginning).

moonshineTheleocat commented 8 years ago

Honestly... you can update more than one probe at a time with the deferred irradiance volume method. If you break it up into cells, and can find a way patch the texture look ups, You can update them per cell, with each cell having a set of probes.

The resolution for a good result does not need to be high for either the light map, or the probes.

http://codeflow.org/entries/2012/aug/25/webgl-deferred-irradiance-volumes/

The voxel cone tracing method (According to crysis) is said to be faster than LPV, and produces near or better results.

That method was actually able to run on the PS3 and Xbox, and has actually been used in a PS4 game with framerates of 30fps for a full game with a lot going on.

The optimizations will require more memory, but when combined with cascades, its a net win. The issue though is its N number of cones per pixel, which means that people that try to go hard with 4k rez will encounter serious problems. Because Irradiance is low frequency, you can down sample the resolution to cut back on the number of traces.

Then... the last bit that will need optimizations is the voxelization process. Assuming the camera is not moving fast enough, you can voxelize geometry in the back a few at a time and cache them. The cache probably shouldn't be a real voxel however as it takes up a lot of memory. Probably just a format that stores Dominant triangle IDs, and Dominant Material per voxel. So... instead of all the information that gets dumped into a voxel, you have a dictionary that should be cheap to store in memory, and trivial to voxelize (No overdraw. You know exactly what triangles to rasteize to produce the same shape. And pull data from.). It'd also mean that you won't have to worry about precomputing everything... as long as you are able to destroy the data you no longer need to keep.

Issue is... you need to limit the amount of times you update the Cascades. Which should technically be fine as they are axis alined, and intege stepped. As long as none of the static world moves, or the camera does not move too quickly (6 cascades, update one per frame) your update can delay a few frames. If the camera is too fast, you can hide it with a motion bur.

This honestly depends though. If you need to get bounce lighting on dynamic objects, you can update the closest cascade every frame, but move it every six frames. If this is possible, subtract all dynamic objects, and revoxelize them in their proper position. On the sixth frame, Update the position of the cascade, and only the static geometry.

That split second hopefully won't be too aggravating.. .or noticeable.

tobspr commented 8 years ago

Well what you say is correct. What I planned implementing was cascades with temporal sampling, that is updating one cascade per frame, and caching the others, then sampling the voxel grid over multiple frames (using a temporal filter). This ensures there are no transitions when lighting changes appruptly.

When changing the camera rotation appruptly, one would probably update the last cascade with higher priority so no lighting information is missing.

While all of this is planned, its not very high on my priority list, there are other things (rectangle/tube area lights) I want to implement first.

moonshineTheleocat commented 8 years ago

Sadly, the most optimal solution may be implementation depdent.

For my case, I'm in the process of writing a design document for my own engine, which handles games similar to the style of baulders gate. Normally downwards facing, with the ability for the camera to zoom in close. And give scenematic shots. Or zoom out far to show the entire map.

The optimal set up for my case would be that each cascade will be located near the edge of one side close to the middle, and pushed inwards slightly to the center.

The voxels themselves will need to be raised to give more resolution to the downwards projection with some breathing room for the top.

Though a thought I had would be to put the cascade on stilts. Which would be calculated on the CPU end by four ray casts set so deep in to the inner most cascades. The idea is that if all four of the corners of a cascade touches the ground, or a stopping plane. The cascade will not be allowed to go any deeper, thus the camera can move up and down within the volume.