hajimehoshi / ebiten

Ebitengine - A dead simple 2D game engine for Go
https://ebitengine.org
Apache License 2.0
10.41k stars 635 forks source link

ebiten: change `[]uint16` to `[]uint32` for indices at `DrawTriangles` and `DrawShaderTriangles` #2829

Open hajimehoshi opened 8 months ago

hajimehoshi commented 8 months ago

Operating System

What feature would you like to be added?

This is a related change to #2612. Now Ebitengine internal uses []uint32 for indices, we should be able to use []uint32 for indices as public APIs instead of []uint16.

As this is a breaking change, let's do it in v3.

Why is this needed?

No response

quasilyte commented 1 week ago

I'll add some context since it's missing in the description:

Why is this needed?

To avoid a full-slice copy. When using DrawTriangles for batch drawing, the indices slice will be re-used on the caller side and it can be quite large (let's say maxUint16/10 elements or so for a particle system is not a rarity).

When these indices are passed to the DrawTriangles as uint16, Ebitengine internally creates a slice of uint32 of the same length and copies everything in there. We can hypothetically get rid of this step if the input argument type is []uint32. Then there would be no extra data copying/allocation in the DrawTriangles method.

I started to use DrawTriangles recently and it's very useful for several stuff, but my most favorite ones are:

There is a way to enable this optimization without breaking the API and it involves Go build tags. I'm not recommending it, but it's just an idea.

  1. Document some build tag, like ebiten_uint32
  2. Introduce build-tag protected files, we need two (one for uint32 and one for uint16)
  3. Use our new function in DrawTriangles

Here is how it could look like:

//go:build ebiten_uint32
package ebiten

type Indice = uint32

// This conversion is no-operation, it just returns the slice.
// No allocations, no data copying.
func convertIndices(indices []Indice) []Indice {
  return indices
}
//go:build !ebiten_uint32
package ebiten

type Indice = uint16

// This is how it behaves by default.
func convertIndices(indices []Indice) []Indice {
  is := make([]uint32, len(indices))
  for i := range is {
    is[i] = uint32(indices[i])
  }
  return is
}

And then in DrawTriangles:

- is := make([]uint32, len(indices))
- for i := range is {
-   is[i] = uint32(indices[i])
- }
+ is := convertIndices(indices)

The signature will change textually, but it will be 100% compatible with the current API.

- DrawTriangles(vertices []Vertex, indices []uint16, img *Image, options *DrawTrianglesOptions)
+ DrawTriangles(vertices []Vertex, indices []Indice img *Image, options *DrawTrianglesOptions)

To use this new API, the user would need to add a tag to their go command. Here is an example:

# Running with []uint32
# The user must pass []uint32 for the code to compile
$ go run --tags ebiten_uint32 ./cmd/mygame
# Running with []uint16
# The user must pass []uint16 for the code to compile
# No tags are specified (it's a default behavior)
$ go run ./cmd/mygame
# It's possible to use it with go-build as well
$ go build --tags ebiten_uint32 -o ./bin/mygame ./cmd/mygame

There could be other ways, like adding a new transitional function that can be used before the older API is changed; but that would increase the public API surface which is bad (we would need something to do with these new functions later).

@Zyko0 may have something extra to add. I would also like to invite @tinne26 for the discussion.

tinne26 commented 1 week ago

Then there would be no extra data copying/allocation in the DrawTriangles method.

I think Hajime always prefers safety in these cases, so even if we pass a slice, it would still be treated as read-only and copied at least once, no? I'm also unfamiliar with all the checks for triangles merging and so on, so I'm not sure how much of a difference this all ends up amounting to.

I'm obviously in favor of the change from []uint16 to []uint32, but I don't think a build tag temporary solution is warranted without actual benchmarks showing that you can put out >10% more triangles per second or something like that after the improvement. Hajime was considering moving to v3 earlier due to the amount of deprecated code and stuff that's been accumulating with all the improvements, so I'd lean against adding even more things to this pile.

And I've said this multiple times, but the more Ebitengine grows, the more we are starting to have these optimization related issues, but there's still no benchmark suite for Ebitengine. If we want to keep pushing in this direction, we need a better foundation (I'm also saying this due to issues like #3023 that Hajime has been acknowledging himself while working on commercial projects).

hajimehoshi commented 1 week ago

As @tinne26 already pointed out, copying slices inside Ebitengine is necessary for user experience rather than performance. We will fix the API in v3, which will come a few years later hopefully. I don't want to add a build tag for this purpose since the use cases would be very niche compared to the cost of build tag managements.

And I've said this multiple times, but the more Ebitengine grows, the more we are starting to have these optimization related issues, but there's still no benchmark suite for Ebitengine. If we want to keep pushing in this direction, we need a better foundation

While this is true, I don't have a good idea about test cases to test #3023 well.

quasilyte commented 1 week ago

I'm working on a particle system and I have some numbers that we can also compare with Godot particle system in terms of "the number of particles we can render until FPS starts to decrease".

I can measure the benefits that we can get if that copy can be avoided (e.g. 50k particles with a copy vs 55k particles without, or something like that). Although if the copy will happen anyway, I'm not sure if I need to do that extra effort measuring and comparing stuff.

hajimehoshi commented 1 week ago

Wouldn't https://github.com/hajimehoshi/ebiten/pull/3026 mitigate your issue to some extent?

Zyko0 commented 1 week ago

Another reason is that uint32 allows more vertices/indices to be submitted at once than uint16 (in the case of particles, it might matter) Otherwise, we need to split the batch manually