High GPU Usage for Culling?

najak3d commented 1 year ago

I'm running the "instanceMeshAnimationDemo" with all of the block-people (225 x 225 = 50,625 instances!).

I set "Application.targetFrameRate = 60", so that it doesn't keep pegging my GPU Usage to 100% (and causing fan to turn on to cool it).... So for this demo, running at 60 FPS, it uses 41% of my GPU looking at the whole horde of them (all in frustum).

If I turn away, so that NONE are showing, the GPU is still 15%, even when Camera is still.

I'm not for sure, but this seems to indicate that maybe the "culling" logic could be optimized considerably.

One suggestion comes to mind:

Only do "culling" type logic periodically (2 Hz), or for significant camera Rotation.
Widen Frustum by 5 degrees (each direction) - and then you can trigger culling to re-run if Camera moves by more than 5 degrees Left/Right.

So the culling would run at "{A} Hz" guaranteed (to account for animated movement coming in/out of focus), OR if the camera is turned Left/Right by "{B} Degrees". These should be user configurable.

I tried simply not calling "SelectRenderInstances" as much -- but apparently, in your pipeline, this is needed EVERY FRAME, else nothing shows up (thus my lame/easy attempt to optimize failed).

mkrebser commented 1 year ago

I have made some trade offs in regards to culling. This is due to generalization of this library- as well as the cpu-gpu synchronization for animations.

To keep everything synchronized- animations/paths are always running/moving. So every instance will always have some calculations invoked for it- including all bones and any other entities in a transform hierarchy.

Additionally things are kept always running because due to (animations/paths/fixed velocities) entities may move on and off a static camera on their own! So they can only be 'culled' once their position is calculated by the shader!

mkrebser commented 1 year ago

Trying to lower to a maybe slower tick rate- like 5hz or something could be feasible, but things may pop into the screen- particularly if they are moving fast

mkrebser commented 1 year ago

Most likely, the only way to implement this, and keep all the features this library provides (ie gpu-cpu lazy calculation) would just be supported an optional slow tick rate. You could add this as a flag on the entities- then update all the tick code & calculations to use an optional tick hz

eg, this would mean objects can be constructed with an optional slow tick rate that can't be changed once created.

I don't think I will add this though, as this library mainly emphasizes generalization + quality over sheer number of entities

mkrebser commented 1 year ago

Oh and I don't have any fancy gpu debugging software 😊

I made some utility functions (in instancemesh.cs) that just read buffers on the gpu into an array on the cpu. Then I just use my regular code debugger to examine the array

najak3d commented 1 year ago

To which array do you dump your debug data? I'd like to see it in action. I think that method works pretty good, but I couldn't get it working in the context of your project -- not sure why. If I made a sample compute shader alongside yours (same scene, running at same time), I could get it working, but when I created my own RW buffer inside the meshinstances.compute shader, it just wasn't working. So I'd like to see one of your buffers that ARE working for this purpose.

== IMO, this library has good potential to become the library for Unity3d Instancing. I'd like to help it get there.

Demonstrating near-zero cost when entities are offscreen, is important. The only thing that needs to happen for all entities off-screen is Positional Movement (cheap) -- and bone animations are not needed when off-screen (big time savings?).

I'm not sure where most of the cost is coming from -- maybe it's 90% from bone-animation -- and if so -- then that would be a nice place to optimize.

The forced Culling Update rate, should be adjustable for each Instancing system.... let user set the interval -- where 0 msec makes it update every frame, and 100 makes it update at 10 Hz (at minimum). But turning the camera would also do this.

Note -- users can set up more than one Instancing system in the scene -- one for fast moving objects, and one for slow -- or NON-MOVING. If NON-MOVING -- then Culling Updates would be triggered by "camera movement/rotation thresholds".

So it would be nice to be able to have Culling rates slowed down (optionally) and have Bone-animations (optionally) be omitted when offscreen. I think this may dramatically reduce costs when offscreen.

najak3d commented 1 year ago

FYI, I was the co-author of Visual3D.NET, put out of business in 2010 by Unity3D. I ended up working for clients using Unity3D for a few years, then went off to do C# (but no more 3D stuff) for about 10 years. Now I'm back, and wanting to become a Unity3D content provider, and game creator -- focusing on XR/VR/AR most likely.

Here's a video of mine from 2009, demo'ing our product that Unity3D sunk. This was 1 million lines of pure C# for everything, and was close to cutting edge for it's time. But then Unity3D received $10M in venture capital, and then ate our lunch shortly after.

https://vimeo.com/8491336 (My daughter gave the intro sentence, and the rest is my voice.)

mkrebser commented 1 year ago

in instancemesh.cs, there is a helper function https://github.com/mkrebser/GPUInstance/blob/master/Assets/Resources/GPUInstance/scripts/instancemesh.cs#L1705

GetDataFromGPU. I will normally just add extra code somewhere to fetch a buffer before and after a shader runs!

eg, right here is where the update shaders are running, whos only job is to push data to GPU buffers https://github.com/mkrebser/GPUInstance/blob/master/Assets/Resources/GPUInstance/scripts/instancemesh.cs#L1526

mkrebser commented 1 year ago

And IDK what the cause of your GPU usage is from culled entities. There definitely is still a bunch of calculations being done on them before they get culled due to all of the simulation that is done. Could be this. I would really have to profile it.

Although I don't have enough time to work on bigger features like low hz simulation. If I did have more time to work on this library then I would probably work on something more impactful like animation blending & importing animation state machines into the shaders

najak3d commented 1 year ago

My current path of development is XR/VR on the MetaQuest Pro, with Facial/eye tracking.

I am hoping to maximize quantity of content via Instancing techniques. I'll be comparing your solution to the GPUInstancer on the Asset Store, here, by GurBu:

https://assetstore.unity.com/packages/tools/utilities/gpu-instancer-117566

The right solution might end up being a hybrid solution. GurBu's solution results in every instance being assigned 4 Game Objects (parent + 3 LODs) - viewable inside Unity Editor Hierarcy. Yet it runs fairly efficiently, I think your performance has him beat.

ATM, I'm in process of creating a VR mini-app that combines the following:

VR movement via hand controller joysticks.
Simple Shooter game with telekinesis power.
Eye Tracking - used for various functions TBD: Aiming, grabbing, throwing, UI interactions.
GPU Instancing for adding crowds of creatures or humanoids (targets for throwing or shooting?)

Then just let it grow from there, mostly to serve the purposes of learning of POC's.

For GPU Instancing, I'll need a system that enables me to transition rigged models to/from the GPU animated bone state seamlessly.

I started here, for POC of how much power we can expect from GPU instancing.

ATM, GurBu's stuff supports XR on the Meta Quest Pro -- I'm able to instance about 1000 in-frustum avatars on the headset before it starts to glitch. That's pretty good, IMO.

TCKingCeryn commented 1 year ago

Following, im in a similar trajectory. Right now i use a hacky GPU instancing shader to render huge crowds of enemies, was about to try the GurBu GPU Crowd Animator add on to see if the system is a bit more streamlined, but trying this first.

mkrebser / GPUInstance

High GPU Usage for Culling? #5