mrdoob / three.js

JavaScript 3D Library.
https://threejs.org/
MIT License
101.81k stars 35.31k forks source link

Proposal: Decoupling WebGLRenderer from ThreeJS Scene graph #4221

Closed bhouston closed 4 years ago

bhouston commented 10 years ago

Hey all, I've been a bit absent from contributing recently, maybe that can change a bit...

This might seem like a radical idea and I admit that I am little worried that I am proposing that I take this one, but I think it is something that needs to be done. Let me explain.

http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BRenderstate%20change%20costs%5D%5D

I think ThreeJS needs this for maximum speed.

Overall, I think this could speed things up (via shader batching), reduce the complexity of the WebGLRenderer class (by separating out the WebGLMesh, WebGLParticles and WebGLLines as well as the SceneTraverser), and increase the usefulness of ThreeJS's WebGLRenderer outside of strict ThreeJS Scene rendering.

I will have some time over the Christmas break to make an attempt at implementing this if it could possibly be desired. I think that these changes can be limited to just the WebGL components so that the other renderers can stay the same. I think I can even keep the main interface to the new WebGLRendererX class the same as the other renderers.

I've written a shader batcher before as well as a scene traverser, just not for WebGL, rather DirectX 10. Thus I do have a workable design that I can merge together with ThreeJS.

Let me know if I shouldn't attempt this or what other improvements I should try to work in if you think this is a good idea.

WestLangley commented 10 years ago

Welcome back, @bhouston! +1

arodic commented 10 years ago

Great idea! And good to have you back :)

Aleksandar Aki Rodić | @xyz_ak https://twitter.com/xyz_ak | +1 510 761 5522 | aleksandarrodic.com

On Tue, Dec 17, 2013 at 12:31 PM, WestLangley notifications@github.comwrote:

Welcome back, @bhouston https://github.com/bhouston! +1

— Reply to this email directly or view it on GitHubhttps://github.com/mrdoob/three.js/issues/4221#issuecomment-30787937 .

cecilemuller commented 10 years ago

Welcome back as well and that would be great! :-)

Another benefit is that the scenegraph could be manipulated on the server-side even if it's not capable of rendering WebGL, which could come in handy for running some automated tests with Travis CI

ghost commented 10 years ago

Would there not be concerns over essentially offloading parralel work loads complexity into the CPU space? Could this delay not be substantial enough in the case of shader sets complex enough to warrant this, to entirely negate any benefit, while also delaying any other cycle sharing tasks at the same time?

Interesting idea though.

-----Original Message----- From: "Ben Houston" notifications@github.com Sent: ‎18/‎12/‎2013 1:19 AM To: "mrdoob/three.js" three.js@noreply.github.com Subject: [three.js] Proposal: Decoupling WebGLRenderer from ThreeJS Scenegraph (#4221)

Hey all, I've been a bit absent from contributing recently, maybe that can change a bit... This make seem like a radical idea and I admit that I am little worried that I am proposing that I take this one, but I think it is something that needs to be done. Let me explain. First, there is this idea of shader batching, it is a means of getting the fastest performance possible from a GPU. It involves the reordering of scene data so that you group it by shaders to minimize state calls: http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BRenderstate%20change%20costs%5D%5D I think ThreeJS needs this for maximum speed. The second issue is the high complexity of the WebGLRenderer class. While it is easy to use, it is actually fairly complex internally. I think that we could separate it out so that the GPU renderable Mesh, Particle and Line data isn't built and maintained in the renderer, but rather as separate classes. Thus there is WebGLMesh, WebGLParticles and WebGLLine that are self-contained generally rather than being implicit classes withing the WebGLRenderer. The third issue is that I would like to separate out the traversal of the ThreeJS scene graph from WebGLRenderer. If we are going to allow reordering of the renderable items, I think it is best to separate out the traversal of the ThreeJS Scene to be a separate class. This we could have standard separate WebGLSceneTraverser that basically goes over the scene and converts the data to WebGLMesh, WebGLParticles and WebGLLine and submits these to the shader batching core. This last issue would allow for people to write things that do not use the standard ThreeJS Scene graph in the standard way, the could write their own algorithm that submits WebGLMeshes, WebGLParticles and WebGLLines to the renderer in any fashion they want. Overall, I think this could speed things up (via shader batching), reduce the complexity of the WebGLRenderer class (by separating out the WebGLMesh, WebGLParticles and WebGLLines as well as the SceneTraverser), and increase the usefulness of ThreeJS's WebGLRenderer outside of strict ThreeJS Scene rendering. I will have some time over the Christmas break to make an attempt at implementing this if it could possibly desired. I think that these changes can be limited to just the WebGL components so that the other renderers can stay the same. I think I can even keep the main interface to the new WebGLRendererX class the same as the other renderers. I've written a shader batcher before as well as a scene traverser, just not for WebGL, rather DirectX 10. Thus I do have a workable design that I can merge together with ThreeJS. Let me know if I shouldn't attempt this or what other improvements I should try to work in if you think this is a good idea. — Reply to this email directly or view it on GitHub.

bhouston commented 10 years ago

@MJCD- wrote:

Would there not be concerns over essentially offloading parralel work loads complexity into the CPU space? Could this delay not be substantial enough in the case of shader sets complex enough to warrant this, to entirely negate any benefit, while also delaying any other cycle sharing tasks at the same time?

The way that I do this is to just have a hash table where one collects each use of a shader. Then one proceeds linearly through the hash table rendering each shader's geometry in order. Thus this isn't Z-sorted, it is just shader grouped. I think that a hashtable inserts with a linear traversal afterwards is minimal computation compared to everything else.

So the costs are not actually high in my opinion.

Also remember the costs are in switching shaders / GPU state, not shader complexity itself.

WestLangley commented 10 years ago

@bhouston

Then one proceeds linearly through the hash table rendering each shader's geometry in order.

... and do you intend to do so with a single draw call for all geometries sharing a shader? ( assuming they are all meshes, in this case )

Also remember the costs are in switching shaders / GPU state

... and number of draw calls -- or is that not an issue?

bhouston commented 10 years ago

... and do you intend to do so with a single draw call for all geometries sharing a shader? ( assuming they are all meshes, in this case )

I can't share draw calls usually because that would involve having to merge the data buffers of different meshes, which is hard to do automatically in a way that is guaranteed to be efficient across frames. So the number of draw calls will stay the same as it is now.

... and number of draw calls -- or is that not an issue?

This should not affect the number of draw calls (no more and no less), just the order in which they are done so that those sharing shaders are done sequentially avoiding unnecessary shader changes.

To minimize draw calls one needs to merge together geometry so that there are fewer buffers to load. This can be done with both the current architecture and the new architecture in the same fashion.

What really should be done to minimize draw calls is a compile ThreeJS Scene function that takes a static ThreeJS Scene and replaces it with a set of large BufferGeometries that are shader batch optimized. You then use these to draw the static elements as fast as possible while using a real ThreeJS Scene for the dynamic interactive elements. Or is this already in existance somewhere in the code base of ThreeJS?

I would like to reduce the number of unique shaders created in Three.JS though by making things more variable driven.

Even though it is about a decade old, this is still a very relevant document on the benefits of batching: http://ce.u-sys.org/Veranstaltungen/Interaktive%20Computergraphik%20(Stamminger)/papers/BatchBatchBatch.pdf

bhouston commented 10 years ago

Another good presentation (that is a decade old) on the costs of state changes, they are equivalent in cost to a draw call it says:

ftp://download.nvidia.com/developer/presentations/GDC_2004/Dx9Optimization.pdf

WestLangley commented 10 years ago

Merging geometries is only a solution if the geometries are static (relative to each other).

Currently, if you have 1000 meshes, each sharing a single SphereGeometry and a single material, it results in 1000 draw calls.

If you replace the geometry with a non-indexed BufferGeometry, it still results in 1000 draw calls.

I was hoping this proposal was to modify WebGLRenderer so that it would reduce to a single draw call in this case.

Yes, the cost of state changes, the cost of draw calls, and implementation complexity, would be the deciding factors.

bhouston commented 10 years ago

@WestLangley If the sphere is actually shared, then it would be an instanced geometry case. Instanced geometry is a bit different that precompiling a scene into shader batches. I am not prepared to handle automatic instancing at this time as it is a different problem.

But if I can modify your example so that there was 1000 small meshes in a ThreeJS Scene graph each sharing the same shader settings, but had different vertices/faces, a precompilation step could merge these all into a minimal set of BufferGeometries that is much less than 1000, it would be limited in size only by the index/vertex buffer sizes, often 65K. Then one could use this simplier scene to render it with a minimal number of draw calls.

So the idea of a precompiler for static ThreeJS Scenes is still a useful idea to minimize draw calls.

WestLangley commented 10 years ago

@bhouston OK. Remember, non-indexed buffer geometries do not suffer from the 65K limit.

@mrdoob suggested elsewhere that we could perhaps create BufferGeometry and IndexedBufferGeometry to prevent confusion. I think this is a good idea.

mrdoob commented 10 years ago

This sounds good to me :) Feel free to give it a shoot!

arodic commented 10 years ago

Ben, since the shader batching does not include z-sorting would you loose the benefit of drawing opaque geometry front to back?

Also, does that mean you don't have to sort the scene graph every frame but only when you add/remove objects/materials? On Dec 18, 2013 8:50 AM, "Mr.doob" notifications@github.com wrote:

This sounds good to me :) Feel free to give it a shoot!

— Reply to this email directly or view it on GitHubhttps://github.com/mrdoob/three.js/issues/4221#issuecomment-30858429 .

gero3 commented 10 years ago

The problem is transparent objects. We should have 2 lists, One for opaque object and 1 for transparent object. which can be sorted differently.

bhouston commented 10 years ago

@gero3 wrote "The problem is transparent objects. We should have 2 lists, One for opaque object and 1 for transparent object. which can be sorted differently."

exactly. :) That is how my current shader batcher works.

bhouston commented 10 years ago

@WestLangley: I hope to make the ordering of the jobs as a module so that you can (1) shader batch, (2) you can z-sort or (3) z-sort within each batch. That should be really simple to change because the rest of the engine should just support taking in these batch items in what ever order.

You wrote: "Also, does that mean you don't have to sort the scene graph every frame but only when you add/remove objects/materials?"

Shader batchers generally do the batching on a per frame basis (I'm talking about rendering all the items that use the same shader sequentially, not the merging of geometry.) But it is a simple to batch things together, just go through the list and insert it all into a hashtable where each shader maps onto a list of batch items to render. Thus it is faster than a z-sort. To add z-sorting just sort each shader item list in z-order. That would be faster than traditional full scene z-ordering because each shader specific list would be shorter.

WestLangley commented 10 years ago

You wrote: "Also, does that mean you don't have to sort the scene graph every frame but only when you add/remove objects/materials?"

@bhouston That question was not asked by me, actually.

You also have to sort whenever the camera or an object moves.

zz85 commented 10 years ago

@bhouston interesting, do give it a shot! :) however, i'm also reading that renderstate changes are not as expensive as multiple draw calls over less draw calls, which brings up #4160

zz85 commented 10 years ago

Would be really nice to see how much improvement this can bring in numbers too, since the originally referenced article didn't ..

bhouston commented 10 years ago

So I've started writing it, will continue later this week. Maybe one could call it LowLevelWebGLRenderer. Here is the interface so far:

var vertexName = THREE.AttributeName.Request( "vertices", gl.FLOAT, 3 );
var vertices = new THREE.Attribute( vertexName, 3 );
vertices.set( 
    [
         0.0,  1.0,  0.0,
        -1.0, -1.0,  0.0,
         1.0, -1.0,  0.0
    ] );

var indices3Name = THREE.AttributeName.Request( "indices3", gl.UNSIGNED_SHORT, 3 );
var faces = new THREE.Attribute( indices3Name, 1 );
faces.set(
    [ 0, 1, 2 ] 
    );

var mesh = new THREE.AttributeSet();
mesh.addAttribute( faces );
mesh.addAttribute( vertices );

var meshBuffers = new THREE.BufferSet( gl, mesh );
meshBuffers.update();

var batchItem = new THREE.MeshBatchItem();
batchItem.bufferSet = meshBuffers;

var dispatcher = new THREE.BatchDispatcher();
dispatcher.registerRenderer( new THREE.MeshBatchRenderer() );

dispatcher.enqueueItem( batchItem );
dispatcher.render();

I understand the above probably isn't very clear at this time, but it is relatively simple in terms of an interface I think and relatively clear.

Stuff left to do:

bhouston commented 10 years ago

So I've added uniform support and multi-pass rendering:

var meshAttributes = new THREE.AttributeSet();

var vertexName = THREE.Attribute.Request( "vertices", gl.FLOAT, 3 );
var vertices = new THREE.Attribute( vertexName, 9 );  // should one use the number of elements, or items?
vertices.set( 
    [
         0.0,  1.0,  0.0,
        -1.0, -1.0,  0.0,
         1.0, -1.0,  0.0
    ] );
// This also works:
vertices.set(
    [
        new THREE.Vector3( 0, -1, 0 ),
        new THREE.Vector3( -1, -1, 0 ),
        new THREE.Vector3( 1, -1, 0 )
    ] );

var indices3Name = THREE.Attribute.Request( "indices3", gl.UNSIGNED_SHORT, 3 );
var faces = new THREE.Attribute( indices3Name, 3 ); // using number of elements rather than items
faces.set(
    [ 0, 1, 2 ] 
    );

mesh.addAttribute( faces );
mesh.addAttribute( vertices );

var meshUniforms = new THREE.UniformSet();

var diffuseColor = THREE.Uniform.Request( "diffuseColor", gl.FLOAT, 3, false );
meshUniforms.set( diffuseColor, new THREE.Color( 1, 0, 0 ) );

var ambientColor = THREE.Uniform.Request( "ambientColor", gl.FLOAT, 3, false );
meshUniforms.set( ambientColor, new THREE.Color( 0.1, 0.1, 0.1 ) );

var meshBuffers = new THREE.BufferSet( gl, mesh );
meshBuffers.update();

var batchItem = new THREE.BatchItem();
batchItem.type = 0; // TODO: add a unique program/rendering method identifier scheme.
batchItem.bufferSet = meshBuffers;
batchItem.uniforms = meshUniforms;

var lightUniforms = new THREE.UniformSet();

var lightPositions = THREE.Uniform.Request( "lightPositions", gl.FLOAT, 3, true );

lightUniforms.set( lightPositions,
    [
        new THREE.Vector( 100, 0, 0 ),
        new THREE.Vector( 100, 100, 0 )
    ] );

var renderers = // TODO: Figure out what to use here.
renderers.addRenderer( new THREE.MeshBatchRenderer() );

var transparencyPass = new THREE.RenderPass();
transparencyPass.enqueueItem( batchItem ); // technically this object isn't transparent

var mainPass = new THREE.RenderPass();
mainPass.enqueueItem( batchItem );

// render solid objects first sorted by program
renderPass.sortByProgram(); // could be done on batch item insertion rather than here.
renderPass.render( renderers, lightUniforms );

// render transparent objects second, sorted by z-depth
transparencyPass.sortByZDepth(); // could be done on batch item insertion rather than here.
transparencyPass.render( renderers, lightUniforms );

TODOs:

bhouston commented 10 years ago

Here is the basic shadow map setup without PCF or Cascading Shadow Maps:

Basically I'm using the same type of batch items and render passes to do the shadows, thus allowing for the same sorting mechanisms to be deploy.

var meshAttributes = new THREE.AttributeSet();

var vertexName = THREE.Attribute.Request( "vertices", gl.FLOAT, 3 );
var vertices = new THREE.Attribute( vertexName, 9 );  // should one use the number of elements, or items?
vertices.set( 
    [
         0.0,  1.0,  0.0,
        -1.0, -1.0,  0.0,
         1.0, -1.0,  0.0
    ] );
// This also works:
vertices.set(
    [
        new THREE.Vector3( 0, -1, 0 ),
        new THREE.Vector3( -1, -1, 0 ),
        new THREE.Vector3( 1, -1, 0 )
    ] );

var indices3Name = THREE.Attribute.Request( "indices3", gl.UNSIGNED_SHORT, 3 );
var faces = new THREE.Attribute( indices3Name, 3 ); // using number of elements rather than items
faces.set(
    [ 0, 1, 2 ] 
    );

mesh.addAttribute( faces );
mesh.addAttribute( vertices );

var meshUniforms = new THREE.UniformSet();

var diffuseColor = THREE.Uniform.Request( "diffuseColor", gl.FLOAT, 3, false );
meshUniforms.set( diffuseColor, new THREE.Color( 1, 0, 0 ) );

var ambientColor = THREE.Uniform.Request( "ambientColor", gl.FLOAT, 3, false );
meshUniforms.set( ambientColor, new THREE.Color( 0.1, 0.1, 0.1 ) );

var meshBuffers = new THREE.BufferSet( gl, mesh );
meshBuffers.update();

var batchItem = new THREE.BatchItem();
batchItem.type = 0; // TODO: add a unique program/rendering method identifier scheme.
// Q: No need to list if it is a morphing or a skinned character, that is introspectable from bufferSet contents?  Is that efficient.
batchItem.primitiveType = gl.TRIANGLE_LIST; // Q: Do I need this?  Or does the programType + bufferSet contents fully determine this?
batchItem.buffers = meshBuffers;
batchItem.uniforms = meshUniforms;

var renderers = // TODO: Figure out what to use here.
renderers.addRenderer( new THREE.MeshBatchRenderer() );

var transparencyPass = new THREE.RenderPass();
var mainPass = new THREE.RenderPass();

var lightPositionsName = THREE.Uniform.Request( "lightPositions", gl.FLOAT, 3, true );
var shadowMapRenderPasses = [];
var lightPositions = [];

// traverse scene lights and setup global uniforms and shadow maps
for( var lights in scene ) {
    if( light.shadowMap ) {
        shadowMapRenderPasses.push( {
            renderPass: new THREE.RenderPass(),
            uniforms: new THREE.UniformSet( {
                lightPositionsName, light.position
                } ),
            renderTarget: new THREE.WebGLRenderTarget( light.shadowMapWidth, light.shadowMapHeight )
            } );
    }
    lightPositions.push( lights.position );
}

var lightUniforms = new THREE.UniformSet();
lightUniforms.set( lightPositionsName, lightPositions );

// this is the main scene render traversal code simplified.
for( var mesh in scene ) {
    var batchItem = // TODO: smart conversion and caching code here.
    for( var shadowMapRenderPass in shadowMapRenderPasses ) {
        shadowMapRenderPass.renderPass.enqueue( batchItem );
    }
    mainPass.enqueue( batchItem );
    transparencyPass.enqueue( batchItem ); // technically this object isn't transparent
}

// render shadow map depth buffers
for( var shadowMapRenderPass in shadowMapRenderPasses ) {
    shadowMapPasses.render( depthRenderers, shadowMapRenderPass.uniforms, shadowMapRenderPass.renderTarget );
}

// render solid objects first sorted by program
mainPass.sortByProgram(); // could be done on batch item insertion rather than here.
mainPass.render( renderers, lightUniforms );

// render transparent objects second, sorted by z-depth
transparencyPass.sortByZDepth(); // could be done on batch item insertion rather than here.
transparencyPass.render( renderers, lightUniforms );
bhouston commented 10 years ago

Conversion of BufferedGeometry to AttributeSet was easy, as AttributeSet is really a subset of BufferedGeometry (it is just the named attributes set as its name implies):


// cached attribute names.
var indexName = THREE.AttributeName.Request( "index", gl.UNSIGNED_INT, 1 );
var positionName = THREE.AttributeName.Request( "position", gl.FLOAT, 3 );
var uvName = THREE.AttributeName.Request( "uv", gl.FLOAT, 2 );
var normalName = THREE.AttributeName.Request( "normal", gl.FLOAT, 3 );

// TODO: Research how multiple UVs handled inn WebGLRenderer.

// convert Geometry type to new AttributeSet or update an existing one.
var bufferGeometryToAttributeSet = function( geometry, optionalTarget ) {

    var attributeSet = optionalTarget || new THREE.AttributeSet();

    if( bufferGeometry.attributes[ "index" ] ) {
        attributeSet.set( // sets dirty on this attribute if exists.
            indexName,
            bufferGeometry.attributes['index'].array
            );
    }
    else {
        attributeSet.remove( indexName );

    }

    if( bufferGeometry.attributes[ "position" ] ) {
        attributeSet.set( // sets dirty on this attribute if exists.
            positionName,
            bufferGeometry.attributes['position'].array
            );
    }
    else {
        attributeSet.remove( positionName );
    }

    if( bufferGeometry.attributes[ "normal" ] ) {
        attributeSet.set( // sets dirty on this attribute if exists.
            normalName,
            bufferGeometry.attributes['normal'].array
            );
    }
    else {
        attributeSet.remove( normalName );
    }

    if( bufferGeometry.attributes[ "uv" ] ) {
        attributeSet.set( // sets dirty on this attribute if exists.
            uvName,
            bufferGeometry.attributes['uv'].array
            );
    }
    else {
        attributeSet.remove( uvName );
    }

    // TODO: Handle arbitrary additional parameters.

    return attributeSet;
};
bhouston commented 10 years ago

Here is a working low-level batching WebGL Renderer:

https://gist.github.com/bhouston/8303744

And here is an example that uses it from a ThreeJS Scene graph:

https://gist.github.com/bhouston/8303776

I've implemented supports for arbitrary uniforms, attributes and program, which is the basics. Next up is converting existing ThreeJS materials to this, but I would like to do #4271 first.

I know this is a big change and thus it will naturally be hard to get acceptance for it regardless of its merits. It is a flexible and blazingly fast approach though. It is highly modular as well because of its separation of concerns.

ghost commented 10 years ago

I'd love to see this come into the mainline code. When I worked at xbox this sort of batching was considered standard practice and I've seen what massive performance improvements it can provide and what big worlds it can enable. Nice work, Ben - much appreciated.

On Tue, Jan 7, 2014 at 10:12 AM, Ben Houston notifications@github.comwrote:

Here is a working low-level batching WebGL Renderer:

https://gist.github.com/bhouston/8303744

And here is an example that uses it from a ThreeJS Scene graph:

https://gist.github.com/bhouston/8303776

I've implemented supports for arbitrary uniforms, attributes and program, which is the basics. Next up is converting existing ThreeJS materials to this, but I would like to do #4271https://github.com/mrdoob/three.js/issues/4271first.

I know this is a big change and thus it will naturally be hard to get acceptance for it regardless of its merits. It is a flexible and blazingly fast approach though. It is highly modular as well because of its separation of concerns.

— Reply to this email directly or view it on GitHubhttps://github.com/mrdoob/three.js/issues/4221#issuecomment-31762938 .

safetydank commented 10 years ago

Good stuff. Have you considered using sort-based draw call bucketing for batching? With a suitable key generation interface it enables the client to select the best batching criteria for their app.

http://realtimecollisiondetection.net/blog/?p=86

bhouston commented 10 years ago

@safetydank That is a planned, for sure. And really it is just a few lines of code in this design. It is the key generation that I need to get going and it needs to be based on the shaders. I am trying to get a shader system where the name of the shader and its defines are explicit so I can create a key based on it. The rest of the sort should be easy to do based on the uniforms if we want to go that far.

safetydank commented 10 years ago

I think it should be sufficient to create a key based on a hash of the combined shader source?

Key based sorting handles draw call ordering details other than shaders, too. It can be used to order the transparent/opaque objects and even z-ordering, all in a single sort pass. If you're planning to do this it's good to nail these details up front, can be more work to retrofit them to a different system.

mrdoob commented 10 years ago

This all sounds great to me.

However, I wonder if it wouldn't be better to implement this in WebGLRenderer3 and slowly try to reach parity with the current WebGLRenderer?

safetydank commented 10 years ago

Has there been any progress on this?

mrdoob commented 10 years ago

Tiny bit: #4438

titansoftime commented 9 years ago

I've gotta ask.. what's up with all this? Is the current dev build working towards this?

tschw commented 9 years ago

6963 discusses the separation of materials, programs and shaders.

Also, once sketched out

/**
 * @typedef {{
 *     elementCount: !number,
 *     indexBufferOffset: !number,
 *     vertexBufferOffset: ?number,
 *     sceneGraphNode: !THREE.Object3D,
 *     geometry: !THREE.IGeometry,
 *     material: !THREE.Material
 * }}
 */
THREE.Renderable;

as input for the renderer. The frequency of the referenced entities (node, geometry and material) could be counted and multiplied with the time for switching the corresponding state. The results in ascending order give the optimal priorities for of the sorting criteria. We may want to use IDs instead of object references, another indirection for the drawcall info, and/or map (parts of) it into a typed array, in practice.