taontech / githublog

一个基于github issues的博客系统,实时呈现,零依赖,零代码部署,不用打包不用上线。
4 stars 1 forks source link

Strip is NOT cool! #60

Open taontech opened 1 year ago

taontech commented 1 year ago

To Strip or Not To Strip

In this post I will try to explain why a performance-focused OpenGL application like X-Plane does not use triangle strips. Since triangle strips were the best way to draw meshes a few years back, a new user searching for information might be confronted by a cacophony of tutorials advocating triangle strips and game developers saying "indexed triangles are better" without explaining why. Here's the math.

Please note: this article applies to OpenGL desktop applications, typically targeting NVidia and ATI GPUs. In the mobile/embedded space, it's a very different world, and certain GPUs (cough, cough, PowerVR, cough cough) have some fine print attached to them that might make you reconsider.

Why Triangle Strips Are Good

If you are drawing a bunch of connected triangles, the logic in favor of triangle strips is very simple: the number of vertices in the strip will be almost 66% fewer than the number you'd have if you simply made triangles. The longer the strip, the closer to that savings you get. Since geometry throughput is generally limited by total vertex count, this is a big win.

Ten years ago, that's all you needed to know. Of course, making triangle strips is not so easy - some meshes simply won't form strips. The general idea was to make as many strips as you can, and draw the rest of your triangles as "free triangles" (e.g. GL_TRIANGLES, where each triangle is 3 vertices, and no vertices are shared).

(By the way, to see how to use the tri_stripper library to create triangle strips, look at the function DSFOptimizePrimitives in the X-Plane scenery tools code. Why DSFLib does this will have to be explained in another post, but suffice it to say, there is no hypocrisy here: X-Plane disassembles the triangle strips in the DSF into "free triangles" on load.)

Indexing Is Better

In an indexed mesh, each vertex is stored only once, and the triangles are formed from a set of indices. (In OpenGL this is done by moving from glDrawArrays to glDrawElements.) With an index, you pay more (2 or 4 bytes) per each vertex, but you don't ever have to repeat the geometry of a vertex.

When is it worth it to index? It depends on the size of your indices and vertices, but it is almost always a win. For example, in an X-Plane object our vertices are 32 bytes (XYZ, normal, one UV map, all floating point) and our indices are 4 bytes (unsigned integer indices). Thus a vertex is 8x more expensive than an index. So if we can reduce 1/8th of the geometry via sharing, we will have a win.

Consider a simple 2-d grid: even with triangle strips, each adjacent strip except the edges are going to share a common edge. Thus if we use indexing, our 2-d mesh is going to have a savings that nearly approaches 2x for the geometry! That is way more than enough to pay for the cost of the indices.

So the moral of the story is: any time your geometry has shared vertices, use indexing. Note that this won't always happen. If you have a mesh of GL_POINTS, you will have no sharing, so indexing is a waste. In X-Plane, our "trees" are all individual quads, no sharing, so we turn off indexing because we know the indexing will do no good.

But for most "meshed" art assets (e.g. anything someone built for you in a 3-d modeler) it is extremely likely that indexing will cut down the total amount of data you have to send to the GPU, and that is a good thing.

Triangle Strips Aren't That Cool When We Index

Now in the old school world, a triangle strip cut the amount of geometry down by almost 3x. Awesome! But in the indexed world, a triangle strip only cuts down the size of the index list by 3x. That is...not nearly as impressive. In fact, in X-Plane's case it is only 1/8th as impressive as it would have been for non-indexed geometry.

The take-away thing to observe: once we start indexing (which really makes geometry storage efficient) triangle strips aren't nearly as important as they used to be.

Restarting a Primitive Hurts

So far we've talked about ideal cases, where your triangle strips are very long, so we really approach a 3x savings. Here's the hitch: in real life triangle strips might be very short.

The problem with triangle strips is that we have to tell the card where the triangle strips begin and end, and that can get expensive. You might have to issue a separate glDrawElements call for each strip.

You don't want to make additional CPU calls into the GL to minimize the size of a buffer (the index buffer) that is already held in VRAM. CPU calls are much slower. And this is why X-Plane doesn't use strips internally: it's faster to be able to make one draw call only for mesh, even if it means a slightly bigger element list.

Now if you are a savvy OpenGL developer you are probably screaming one of two things: What about glMultiDrawElements? I point you to here, and here. Basically both Apple and NVidia are suggesting that the multi-draw case may not be as ball-bustingly fast as it could be. There is always a risk that the driver decomposes your carefully consolidated strips into individual draw calls, and at that point you lose. What about primitive restart? Well, it's nvidia only, so if you use it, you need to case your basic meshing code to handle its not being there. And even if it is there, you pay with an extra index per restart. If you have really good strips, this might be a win, but when the strips get small, you're starting to eat away at the benefit of shrinking down your indices in the first place. (The worst case is a triangle soup with no sharing, so you get no benefit from tri strips and you have to put a "restart" primitive into every 4th slot.) And this brings me to one more concern: even if you do have some nice triangle strips in your mesh, you may have free triangles too, and in that case you're going to have to make two separate batch calls (GL_TRIANGLE_STRIP, GL_TRIANGLES) for the two "halves" of the mesh. So even if you are getting a triangle strip win, you're probably going to double the number of real draw calls (even with multi-draw) just to shrink an index list down.

Index Triangles

Thus the X-Plane solution: any time we have a mesh, we use indexed triangles and we go home happy. We always draw every mesh in only one draw call. We share vertices as much as possible. We are in no way dependent on the driver handling multi-draw or having a restart extension. We run at full speed even if the actual mesh doesn't turn to strips very well. The code handles only one case. As a final note, this post doesn't discuss cache coherency - that is, if you are going to present the driver with a "triangle soup", what is the best order? That will have to be another post, but for now understand that the point of this post is "indexed triangles are better than strips" - I am not saying "order doesn't matter" - cache coherency and vertex order can matter, no matter how you get the vertices into the GPU.