Usnul commented 6 years ago

Optimizations to animation engine

Currently animation engine chokes on some 500 bones being animated simultaneously, resulting in a very high CPU usage.

Motivation

It is not uncommon to see 3-5 characters at the same time with 500+ bones each in modern games, with current CPU demand such fidelity is not achievable, instead you have to compromise to about 15 bones per character in order to achieve decent performance.

born as a result of this discussion: #13807

looeee commented 6 years ago

I'm not sure how to progress with this issue unless you (or somebody else) has specific suggestions for improvement.

Did you profile to see where the bottlenecks are before suggesting this? Or would you be willing to do so now?

Or are there other JS animation systems that you are comparing against that have better performance than our one?

Usnul commented 6 years ago

Well, off the top of my head, I know a bit about how the existing animation system is structured, there are a lot of objects, and a lot of object references. This could be reduced by using a less comprehensive but a more cache-friendly data structure for bones and transitions. An optimized can be added, to take existing animations and try to reduce computational complexity. One example: we want to animate v3 linearly from 0s to 10s, starting v3(0,1,2) end v3(3,4,5). Here are two different representations:

function updateV3_x(t){
  v3.x = lerp( 0,3, inverseLerp(t,0,10) )
}
function updateV3_x(t){
  v3.y = lerp( 1,4, inverseLerp(t,0,10) )
}
function updateV3_x(t){
  v3.z = lerp( 2,5, inverseLerp(t,0,10) )
}

function update(t){
   updateV3_x(t);
   updateV3_y(t);
   updateV3_z(t);
}

or

function update(t){
  const f = inverseLerp(t,0,10);
  v3.x = lerp( 0, 3, f );
  v3.y = lerp( 1, 4, f );
  v3.z = lerp( 2, 5, f );
}

second case has fewer instructions, and fewer jumps.

Another point is to put animations to sleep based on their visibility, no point animating what can't be seen. Or if there is a usecase for that - put it behind a flag, let animation system only do as much work as is necessary, instead of animating all those meshes that are not visible.

answering specific questions.

Did you profile to see where the bottlenecks are before suggesting this? Or would you be willing to do so now?

No, I haven't specifically profiled the animation engine. I have profiled my entire system, and I saw that a big chunk of CPU is spent in animation system, not exactly where in the animation system.

Or are there other JS animation systems that you are comparing against that have better performance than our one?

If you would consider them to be comparable - shader-based particle engines. They are pretty much entirely aimed at animation. Other than that - I don't think that comparison is easy to draw, seeing as there aren't that many mature 3d libraries in JS. So no. As an engineer - I see some areas of improvement, some of which are listed here.

donmccurdy commented 6 years ago

I think we'll need to find examples of high-quality animated models to profile. Comparison to skinning and other keyframe-based animation in non-JS engines like Unity and Unreal would be fair game as well. 👍

looeee commented 6 years ago

function update(t){
   updateV3_x(t);
   updateV3_y(t);
   updateV3_z(t);
}

vs

function update(t){
  const f = inverseLerp(t,0,10);
  v3.x = lerp( 0, 3, f );
  v3.y = lerp( 1, 4, f );
  v3.z = lerp( 2, 5, f );
}

This is exactly the kind of thing that you need to profile before assuming that it's a bottle neck - especially since these kind of obvious optimisations may well be done automatically by the JS engine.

I have profiled my entire system, and I saw that a big chunk of CPU is spent in animation system, not exactly where in the animation system.

I suspect that is not unusual - one of the biggest CPU bound operations is going to be calculating animations. Not saying that we can't make improvements, but we should do more profiling and comparison against other systems before making the statement that our system is inefficient.

Usnul commented 6 years ago

@looeee I fear that I come off as saying that our existing thing is bad and terrible. "I believe it can be made a lot better" - that is what I believe. I have about 200+ meshes with 15 bones each or so, at any given time the camera has at most about 12 of them in view. Three.js doesn't recognize that fact, as a result it will animate everything, even those things that are not in the view, if i use the animation system normally; that's up to 6% of all animations that need to be updated each frame, this essentially means that for my usecase alone, having animation sleep on meshes that are not visible gives 94% CPU saving.

< rant > I'm sorry that I come not with numbers and figures that prove deficiencies in the engine. I come with an observation, and a proposal to follow that up. I know that's lazy, I can offer to share my own code base for that - but first there needs to be some interest at all in the community. I feel somewhat like a guy in the forest, shouting about animation engine, and all i hear in response is the rustling of the wind through the tree branches.

If the "community" believes that there is not problem in the animation engine, and any perceived inefficiencies could only be addressed for marginal performance gains - I would rest my case and close the issue. I don't want to push my own agenda when no one else shares it here. < /rant >

looeee commented 6 years ago

The suggestion to sleep animations for anything offscreen sounds like it would be a useful. I can't speak for anyone else in the community, but I would support at least investigating whether that can be added in a reasonably simple manner.

I can offer to share my own code base for that

Are you saying that you have already implemented this in your app?

rustling of the wind through the tree branches

It's probably not disinterest that you hear. People have limited time and are busy working on other things. Generally, changes will happen because somebody spearheads them. In other words, if you want to see these changes then you need to start writing code and making PRs.

Usnul commented 6 years ago

Are you saying that you have already implemented this in your app?

Yes, i do. Visibility queries are very important for performance across my game.

looeee commented 6 years ago

That's great! Then I'm looking forward to exploring your PR 😉

donmccurdy commented 6 years ago

+1 to the above — we are interested but have limited time, and it takes a lot of work to (1) reproduce a problem in a "realistic" scenario, (2) diagnose what is wrong, (3) compare pros/cons of alternatives, (4) implement a PR. All of 1–4 take time, so the more of these steps that others can help with, the more likely we are able to spend our time on the remaining parts. I would even say 1–3 are often even more helpful than the PR itself. 🙂

Usnul commented 6 years ago

@looeee Ha, i appreciate your enthusiasm. I wanted to prompt a discussion on the topic and get a feel for whether or not community/Ricardo wants to more the library in a particular direction.

For my pipeline, making animation sleep outside of active camera frustums - takes either integration of my ECS engine with a few subsystems other than Animation, or at the very least having a Spatial Index and Visibility set calculation code. I have both, but these are assembled into my engine, where three.js is entirely unaware of these parts.

In simple terms, it's like this:

have a spatial index for (at least) animated meshes

each frame:

get active cameras
extract frustums
query animated meshes from spatial index; for each mesh:
retrieve lastUpdatedTimestamp for animation
update animation with stepSize = currentTime - lastUpdatedTimestamp
set lastUpdatedTimestamp = currentTime

looeee commented 6 years ago

Steps 4-6 would probably be fairly simple to implement. But for the rest I see a couple of issues:

frustum culling, whatever method we use for it, would have to be done twice each frame - once prior to animating to determine what needs to be animated, and once after to determine what needs to be rendered. In many cases, especially simpler scenes, the overhead of another frustum cull may be greater than the savings from sleeping animations. This is likely especially true if we switch to spatial indexing. Not such a big deal if we leave it switched off by default though.
The other, bigger, issue is that I'm having difficulty convincing myself that this will work at all in a wide range of scenarios. We cannot know in advance how far an animated object will move each frame, so whenever we put one to sleep we'll need to test it again at the start of the next frame to see whether it has moved back into view. How would this test work without actually updating the object and recalculating it's bounding sphere, or in the case of octrees, moving it between boxes?

Usnul commented 6 years ago

How would this test work without actually updating the object and recalculating it's bounding sphere, or in the case of octrees, moving it between boxes?

this is a very good question question. It's more or less a question of "how can we determine bounds of an animated mesh?". There are a few possible answers to that, one is to scrub through frames and compute maximum bounding box and use that, so working with an over-estimation. There are a lot of caveats to that. Animating bounding box is also possible, pre-process animation, compute bounding box for each key-frame and animate between them. My approach currently is to use an arbitrary multiplication factor (2.0 in my case) to enlarge static bounding box. I know this is not a generic solution.

whatever method we use for it, would have to be done twice each frame - once prior to animating to determine what needs to be animated, and once after to determine what needs to be rendered.

This is where the need to maintain a visibility set comes up. From my experience doing a query on a well-built AABB BVH is negligible in terms of performance, as the cost is ~log(n). There are some tricks you can do with frustum queries also.

How would this test work without actually updating the object and recalculating it's bounding sphere, or in the case of octrees, moving it between boxes?

There is surprisingly a lot of literature on this subject. I use re-insertion, when a box (X) changes bounds - i walk up the tree, looking for a parent large enough to contain it, and then walk down from there - looking for smallest descendant (Y) that would contain the volume, then build a new intermediate node as a parent for X and Y, and re-parent it to old Y's parent. It's pretty trivial, explaining it in text is more complicated than actually coding it.

The process of updating bounds of individual boxes is called "refitting", this contrasts with re-building, when you basically build a new hierarchy from scratch. Refitting typically produces sub-optimal hierarchy which has a tendency to degrade over time, so if you want to retain a good structure - you need to run some kind of incremental optimization on the tree every now and then. But this is really far beyond discussion of animation and even culling :)

titansoftime commented 6 years ago

This is how I did it.

Works for my case (mmorpg style game). I actually tried to get this technique merged into the library, but I believe it was considered to be too app specific, which is understandable.

THREE.Camera.prototype.getListObjectsInFrustum = ( function () {

    var frustum;

    return function ( scene ) {

        var objects = [];

        if ( scene === undefined ) return objects;

        this.updateMatrix();
        this.updateMatrixWorld();
        this.matrixWorldInverse.getInverse( this.matrixWorld );

        if ( frustum === undefined ) frustum = new THREE.Frustum();

        frustum.setFromMatrix( new THREE.Matrix4().multiplyMatrices( this.projectionMatrix, this.matrixWorldInverse ) );

        var modelArray = ENGINE.mc.get('modelArray');

        var ok ;

        for( var i=0, len=modelArray.length; i<len; i++ ){

            if ( modelArray[i].mesh.type == 'SkinnedMesh' ) {

                if( modelArray[i].animationAllowed ){                   

                    ok = frustum.intersectsObject( modelArray[i].mesh );

                    if ( ok ) {

                        objects.push( modelArray[i] );

                    }

                }

            }

        }

        return objects;

    }

} )();

And then in the render loop:

var objects = this.camera.getListObjectsInFrustum(this.scene);

for( i=0,len=objects.length;i<len;i++){

    if( objects[i].mixer ){

        objects[i].mixer.update(delta);

    }

}

looeee commented 6 years ago

There is surprisingly a lot of literature on this subject

@Usnul could you link to some of that literature? I spent a bit of time searching but didn't come across anything. Probably looking in the wrong places

Also, I tried to find examples of other engines that have already implemented this and drew a blank there too. Any ideas if a generic version of something like this has already been implemented somewhere?

Usnul commented 6 years ago

@looeee off the top of my head, here's one paper that i found to be really great:

Fast, Effective BVH Updates for Animated Scenes

http://www.cs.utah.edu/~thiago/papers/rotations.pdf

From my experience, if your scene doesn't have too many moving parts - just doing refitting is enough. If you have a highly dynamic scene - I believe that using BVH might be less optimal than a structure with fixed volumes such as R-tree or a kD-tree, since you need to do minimal housekeeping and they do not degrade in quality. BVH is a lot more precise though and takes less memory potentially, as you don't need to create empty volumes.

@titansoftime It looks very similar in terms of broad strokes to my own. The problem with your approach is that it scales linearly with number of objects, it's fine for a couple of thousands of objects perhaps, but in terms of algorithmic complexity it's clearly lacking. Logically it's very similar. There are some small things in terms of performance, like using a visitor to traverse a collection instead of creating an array, or reusing an array to avoid GC, but that's a wholly different set of considerations.

mrdoob / three.js