jMonkeyEngine / jmonkeyengine

A complete 3-D game development suite written in Java.
http://jmonkeyengine.org
BSD 3-Clause "New" or "Revised" License
3.83k stars 1.13k forks source link

Cleaner interface for instancing (?) #1023

Open riccardobl opened 5 years ago

riccardobl commented 5 years ago

Instancing is implemented with InstancedGeometries and InstancedNodes but but those classes carry a lot of internal code to perform transparent instancing for the developer. I think the engine should provide also a way to perform "raw" instancing in a cleaner manner.

Possible implementation: https://gist.github.com/riccardobl/0a5e87625dd1d1c8bbc9cdd2d9ae13e6

An important use case for this is for high performance particles rendering.

Thoughts?

pspeed42 commented 5 years ago

You can already do raw instancing by manipulating the vertex buffers directly. I do this even in the IsoSurfaceDemo to do instanced trees.

pspeed42 commented 5 years ago

Looking at your code, it provides no real functionality on top of what is already available in Mesh except adding more code to maintain and "yet another path to the renderer".

riccardobl commented 5 years ago

I don't think we are talking about the same thing. Can you point to an example on how you would do this by manipulating the vertex buffers?

pspeed42 commented 5 years ago

For each VertexBuffer you can set how it behaves in the presence of instancing...whether it repeats for each instance (the default), maps 1:1 with the instance, and so on.

""" public void setInstanceSpan(int i)

Sets how this vertex buffer matches with rendered instances where 0 means no instancing at all, ie: all elements are per vertex. If set to 1 then each element goes with one instance. If set to 2 then each element goes with two instances and so on. """

So usually the main position buffers, textures, colors, etc. are set to 0 (the default), and a special model position buffer is set to 1 that maps some number of elements per instance. For JME's default shaders, this is the transform matrix so 16 values map per instance. They can be setup manually like: // Create the transform buffer FloatBuffer xb = MatrixUtils.createMatrixBuffer(transforms);

    VertexBuffer vb = new VertexBuffer(Type.InstanceData);
    vb.setInstanceSpan(1);
    vb.setupData(Usage.Stream, 16, Format.Float, xb);

(MatrixUtils is a util class in the IsoSurface library but it's basically flattening an array of Matrix4fs into the format JME likes for its InstanceData.) https://github.com/Simsilica/IsoSurface/blob/master/src/main/java/com/simsilica/iso/util/MatrixUtils.java#L57

There is no requirement to use that format even if you've developed a custom shader. But even with custom shaders, it is easy to reuse JME's matrix stuff because there is a glsl lib for it. Makes most shader code not have to worry about whether or not the data is instanced or not... it just handles it.

For reference, this is the tree instancing from the IsoSurface demo: https://github.com/Simsilica/IsoSurface/blob/master/src/main/java/com/simsilica/iso/plot/InstanceTemplate.java#L74

It takes one tree mesh, clones it, and sets the appropriate per instance matrix information... using a regular Geometry... regular Mesh. I could also have, for example, interleaved other types of per-instance data, or per two instance data, etc..

It's very flexible (because I fixed it to be flexible way back when it was first added).

riccardobl commented 5 years ago

Ok, this is the opposite of that. What would you do to render the same geomery 1000 times without an additional 1000 elements long vertex buffer?

pspeed42 commented 5 years ago

No matter what, you have to have something to indicate that there are multiple instances. Else what would you be rendering?

Or put another way, why would you be rendering the mesh 1000 times to the exact same location?

riccardobl commented 5 years ago

There are many situations where you can dynamically compute the position of the instance in the vertex shader.

pspeed42 commented 5 years ago

Such as?

riccardobl commented 5 years ago

For example the particle emitter in jme can be implemented with a single vertex shaders using an equation, since particles are not aware of each other and they don't respond to the environment. Or you can go more complex and think about vector fields. Or for example you can render a forest on top of a terrain generated with an heightmap, by sampling the heightmap and adding an offset to xz.

pspeed42 commented 5 years ago

re: particles, using an equation based on what? What are its inputs?

Also note: even though only one draw call is made to the driver, internally it will still be doing 1000 draw calls. Instancing is a way to save memory, not time. It would save time over 1000 separate geometries but it would not save time over a single 1000 element batch.

For your forest (a good use-case for instancing), how do you sample the heightmap? Using what coordinate? Where did you get it?

riccardobl commented 5 years ago

Inputs for particles are Origin SpawnTime g_Time gl_InstanceID The equation is something Origin+ force *something *g_InstanceID *g_Time * direction + gravity * time )^ something -1

Inputs for forest are Origin HeightMap MinOffset MaxOffset gl_InstanceID The coordinate are xz=origin + random(MinOffset,MaxOffset)*g_InstanceID y=texture(HeightMap,xz/mapsize);

pspeed42 commented 5 years ago

So, so far the two use-cases: 1) giant particle meshes following a specific trajectory. 2) batches of trees where the placement doesn't really matter, ie: totally random.

Are these common enough to require a whole new path through the renderer?

Or could these very specific use-cases simply extend Mesh and return whatever instance count they want for what I believe is the same effect?

riccardobl commented 5 years ago

I mean, i provided two random use cases, the limit is the ingenuity of the developer. :man_shrugging: But you chose that.

Consider that extending the mesh may be a bit inconvenient expecially when dealing with meshes loaded from the asset manager, but possible i guess..

noncom commented 5 years ago

If I'm right, Unreal particles are done using this technique: https://docs.unrealengine.com/en-us/Engine/Rendering/ParticleSystems. I would not call this a minor use case. That is a great visual effect and opens much possibilities. It would be great to be able to do similar things in my JME games too.

pspeed42 commented 5 years ago

You had an entirely new Geometry class... that does exactly everything the existing geometry class does except pass the mesh instance count through different than the actual mesh instance count.

It seems that is no harder to work through a j3o than a custom mesh extension. (Like Quad, Sphere, etc. are already mesh extensions.)

pspeed42 commented 5 years ago

I'll reiterate once that the performance of small mesh particles done this way would be horrible... so I doubt they are doing this.

The approach definitely makes sense if you are BATCHING particles... because you don't want to constantly rebatch them.

riccardobl commented 5 years ago

The patch proposes to add a middle class between Geometry and InstancedGeometry that provides only two abstract method used by InstancedGeometry. But ofc this is up to discussion.

It seems that is no harder to work through a j3o than a custom mesh extension. (Like Quad, Sphere, etc. are already mesh extensions.)

not sure what you mean here

I'll reiterate once that the performance of small mesh particles done this way would be horrible... so I doubt they are doing this.

There is no reason for this to be slower than anything else.

pspeed42 commented 5 years ago

1000 instances, slightly faster than 1000 draw calls. Waaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaay slower than a batch of 1000.

Test it.

pspeed42 commented 5 years ago

re: "The patch proposes to add a middle class between Geometry and InstancedGeometry that provides only two abstract method used by InstancedGeometry. But ofc this is up to discussion."

...and modifies a technique.

All to avoid changing the result of Mesh.getInstanceCount().

For example, you could do everything you want to do with one patch that let's the user specify the instance count on the mesh to override the calculated one. Without creating additional classes, requiring additional weird Geometry subclasses, etc..

It's a mesh problem, not a geometry problem.

riccardobl commented 5 years ago

1000 instances, slightly faster than 1000 draw calls. Waaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaay slower than a batch of 1000.

Is this affirmation based on your own measurements?

For example, you could do everything you want to do with one patch that let's the user specify the instance count on the mesh to override the calculated one. Without creating additional classes, requiring additional weird Geometry subclasses, etc..

It's a mesh problem, not a geometry problem.

Sure that's why this is an Issue and not a PR, that patch was just an example on how it could be made, never intented to end in the engine.

pspeed42 commented 5 years ago

re: "Is this affirmation based on your own measurements?"

Yes... which confirmed what "everyone on the internet" will tell you about instancing. I tried to use it for a grass shader. It was pretty horrible compared to batching.

riccardobl commented 5 years ago

Ok, that's quite anecdotal. I assume you did this with the instancing currently available in jme, this means your instances were carrying an huge vertex buffer and performing several transformations on the vertex shader. There is also a catch, particles usually moves.

Said that, you have your reasons, i have mine. You can close the issue if you think there is nothing else that needs to be added.

empirephoenix commented 5 years ago

I do not see it as a mesh problem, as I might want to use the same Mesh in multiple contexts. Eg a single tree rendered in a store has nothing to do with a terrainrenderer using 500as well. Having all Buffers for the tree duplicate to work around this is a bit strange. Afaik there is currently no good way to load a normal model via ssetmanager and use it for instancing and for normal rendering. (This does not mean that I have taken any look at the current proposal here)

noncom commented 5 years ago

But does not Khronos recommend using GPU instancing? https://www.khronos.org/opengl/wiki/Vertex_Rendering#Instancing

It says It will send the same vertices instancecount​ number of times, as though you called glDrawArrays/Elements in a loop of instancecount​ length, so this is a GPU-based optimization?

You mean this is slow?

pspeed42 commented 5 years ago

You know meshes can share buffers, right?

My tree instancing just cloned the loaded tree mesh and set the one buffer to do instancing. I could have just as easily set some magic instance count.

re: "I assume you did this with the instancing currently available in jme, this means your instances were carrying an huge vertex buffer and performing several transformations on the vertex shader."

Yes, I did it with the current instancing. I don't see how that is relevant. There was one 3 element positioning buffer in my test, so for 1000 blades of grass there was a 3000 float buffer. The GPU cares not for buffers of this size. You will have to do some per vertex transformation no matter what... in your examples it is even MORE expensive than anything I was doing.

But it makes sense, for instancing the GPU has to dispatch 1000 separate draw calls internally. It's going to be slower than just one big draw call.

pspeed42 commented 5 years ago

re: "But does not Khronos recommend using GPU instancing? https://www.khronos.org/opengl/wiki/Vertex_Rendering#Instancing

It says It will send the same vertices instancecount​ number of times, as though you called glDrawArrays/Elements in a loop of instancecount​ length, so this is a GPU-based optimization?

You mean this is slow?"

SLOWER THAN BATCHING. SLOWER THAN BATCHING. SLOWER THAN BATCHING. SLOWER THAN BATCHING. SLOWER THAN BATCHING. SLOWER THAN BATCHING. SLOWER THAN BATCHING. SLOWER THAN BATCHING. SLOWER THAN BATCHING. SLOWER THAN BATCHING. SLOWER THAN BATCHING. SLOWER THAN BATCHING. SLOWER THAN BATCHING.

noncom commented 5 years ago

Hmmm, so you recommend to simply allocate a big vertex buffer and manipulate its vertices in the shader as if these were the particles?

empirephoenix commented 5 years ago

That is definitely faster than instancing, if applicable

pspeed42 commented 5 years ago

For small particles, batching will be more efficient and won't take an overly large amount of memory.

It's a trade off. If you have to draw 5000 trees then instancing will save you a bunch of memory over batching. And it will save you time over 5000 tree Geometries.

If you have to draw 5000 triangles then you might as well batch them.

pspeed42 commented 5 years ago

And if you've batched 5000 triangles, you don't want to rebatch them all the time... so if you can calculate trajectory/position in the shader then it's better.

I did this with my waterfall shader.

pspeed42 commented 5 years ago

For example, this is the waterfall: https://www.youtube.com/watch?v=s0rAjdx2PXI

I think it's two Geometries just because there were two different types of particle but I believe they share the same mesh (different Materials).

Mesh is batched once. Particle positions are calculated in shader based on g_Time.

riccardobl commented 5 years ago

You don't need a truck to carry a crate of oranges, this doesn't make trucks useless or worst than your car.

pspeed42 commented 5 years ago

Correct.

But how is that relevant to this discussion?

I said instancing is usually dumb for small meshes. That's still true. Batching is better for small meshes.

MeFisto94 commented 5 years ago

To not derail this discussion:

I said instancing is usually dumb for small meshes. That's still true. Batching is better for small meshes.

But on the other hand for large meshes it could be better.

I mean the trade off is bandwidth (which current particles take) vs. gpu time (which instanced particles take). It all depends on the case and if you're doing cpu heavy calculations or having many particles, instancing might be in favor. That's usual in gamedev but means an engine shall support both.

Eitherway the discussion has derailed, we should focus on how accessible instancing already is.

pspeed42 commented 5 years ago

Current particles take bandwidth because they constantly have to update their batch. If the same particles were rewritten to use the 'instance count only' discussions in this thread then they would only take more memory... since the data only has to get sent once. So then it's only a trade off of RAM versus speed. And often not much RAM for particles.

Anyway, you can already test all of this right now just be extending mesh to provide your own instance count that overrides the default calculation.

riccardobl commented 5 years ago

You are missing the point. Instancing and batching are not simply interchangeable. Small meshes vs Big meshes is non sense, in a real world scenario your whole render time is not defined by your particles or some other random single component of your whole scene.

pspeed42 commented 5 years ago

re: " in a real world scenario your whole render time is not defined by your particles or some other random single component of your whole scene." ...unless you try to use instancing for grass. I bet that will have a pretty big impact on your whole scene. ;)