Closed najak3d closed 1 year ago
I have made some trade offs in regards to culling. This is due to generalization of this library- as well as the cpu-gpu synchronization for animations.
To keep everything synchronized- animations/paths are always running/moving. So every instance will always have some calculations invoked for it- including all bones and any other entities in a transform hierarchy.
Additionally things are kept always running because due to (animations/paths/fixed velocities) entities may move on and off a static camera on their own! So they can only be 'culled' once their position is calculated by the shader!
Trying to lower to a maybe slower tick rate- like 5hz or something could be feasible, but things may pop into the screen- particularly if they are moving fast
Most likely, the only way to implement this, and keep all the features this library provides (ie gpu-cpu lazy calculation) would just be supported an optional slow tick rate. You could add this as a flag on the entities- then update all the tick code & calculations to use an optional tick hz
eg, this would mean objects can be constructed with an optional slow tick rate that can't be changed once created.
I don't think I will add this though, as this library mainly emphasizes generalization + quality over sheer number of entities
Oh and I don't have any fancy gpu debugging software 😊
I made some utility functions (in instancemesh.cs) that just read buffers on the gpu into an array on the cpu. Then I just use my regular code debugger to examine the array
To which array do you dump your debug data? I'd like to see it in action. I think that method works pretty good, but I couldn't get it working in the context of your project -- not sure why. If I made a sample compute shader alongside yours (same scene, running at same time), I could get it working, but when I created my own RW buffer inside the meshinstances.compute shader, it just wasn't working. So I'd like to see one of your buffers that ARE working for this purpose.
== IMO, this library has good potential to become the library for Unity3d Instancing. I'd like to help it get there.
Demonstrating near-zero cost when entities are offscreen, is important. The only thing that needs to happen for all entities off-screen is Positional Movement (cheap) -- and bone animations are not needed when off-screen (big time savings?).
I'm not sure where most of the cost is coming from -- maybe it's 90% from bone-animation -- and if so -- then that would be a nice place to optimize.
The forced Culling Update rate, should be adjustable for each Instancing system.... let user set the interval -- where 0 msec makes it update every frame, and 100 makes it update at 10 Hz (at minimum). But turning the camera would also do this.
Note -- users can set up more than one Instancing system in the scene -- one for fast moving objects, and one for slow -- or NON-MOVING. If NON-MOVING -- then Culling Updates would be triggered by "camera movement/rotation thresholds".
So it would be nice to be able to have Culling rates slowed down (optionally) and have Bone-animations (optionally) be omitted when offscreen. I think this may dramatically reduce costs when offscreen.
FYI, I was the co-author of Visual3D.NET, put out of business in 2010 by Unity3D. I ended up working for clients using Unity3D for a few years, then went off to do C# (but no more 3D stuff) for about 10 years. Now I'm back, and wanting to become a Unity3D content provider, and game creator -- focusing on XR/VR/AR most likely.
Here's a video of mine from 2009, demo'ing our product that Unity3D sunk. This was 1 million lines of pure C# for everything, and was close to cutting edge for it's time. But then Unity3D received $10M in venture capital, and then ate our lunch shortly after.
https://vimeo.com/8491336 (My daughter gave the intro sentence, and the rest is my voice.)
in instancemesh.cs, there is a helper function https://github.com/mkrebser/GPUInstance/blob/master/Assets/Resources/GPUInstance/scripts/instancemesh.cs#L1705
GetDataFromGPU. I will normally just add extra code somewhere to fetch a buffer before and after a shader runs!
eg, right here is where the update shaders are running, whos only job is to push data to GPU buffers https://github.com/mkrebser/GPUInstance/blob/master/Assets/Resources/GPUInstance/scripts/instancemesh.cs#L1526
And IDK what the cause of your GPU usage is from culled entities. There definitely is still a bunch of calculations being done on them before they get culled due to all of the simulation that is done. Could be this. I would really have to profile it.
Although I don't have enough time to work on bigger features like low hz simulation. If I did have more time to work on this library then I would probably work on something more impactful like animation blending & importing animation state machines into the shaders
My current path of development is XR/VR on the MetaQuest Pro, with Facial/eye tracking.
I am hoping to maximize quantity of content via Instancing techniques. I'll be comparing your solution to the GPUInstancer on the Asset Store, here, by GurBu:
https://assetstore.unity.com/packages/tools/utilities/gpu-instancer-117566
The right solution might end up being a hybrid solution. GurBu's solution results in every instance being assigned 4 Game Objects (parent + 3 LODs) - viewable inside Unity Editor Hierarcy. Yet it runs fairly efficiently, I think your performance has him beat.
ATM, I'm in process of creating a VR mini-app that combines the following:
Then just let it grow from there, mostly to serve the purposes of learning of POC's.
For GPU Instancing, I'll need a system that enables me to transition rigged models to/from the GPU animated bone state seamlessly.
I started here, for POC of how much power we can expect from GPU instancing.
ATM, GurBu's stuff supports XR on the Meta Quest Pro -- I'm able to instance about 1000 in-frustum avatars on the headset before it starts to glitch. That's pretty good, IMO.
Following, im in a similar trajectory. Right now i use a hacky GPU instancing shader to render huge crowds of enemies, was about to try the GurBu GPU Crowd Animator add on to see if the system is a bit more streamlined, but trying this first.
I'm running the "instanceMeshAnimationDemo" with all of the block-people (225 x 225 = 50,625 instances!).
I set "Application.targetFrameRate = 60", so that it doesn't keep pegging my GPU Usage to 100% (and causing fan to turn on to cool it).... So for this demo, running at 60 FPS, it uses 41% of my GPU looking at the whole horde of them (all in frustum).
If I turn away, so that NONE are showing, the GPU is still 15%, even when Camera is still.
I'm not for sure, but this seems to indicate that maybe the "culling" logic could be optimized considerably.
One suggestion comes to mind:
So the culling would run at "{A} Hz" guaranteed (to account for animated movement coming in/out of focus), OR if the camera is turned Left/Right by "{B} Degrees". These should be user configurable.
I tried simply not calling "SelectRenderInstances" as much -- but apparently, in your pipeline, this is needed EVERY FRAME, else nothing shows up (thus my lame/easy attempt to optimize failed).