hajimehoshi / ebiten

Ebitengine - A dead simple 2D game engine for Go
https://ebitengine.org
Apache License 2.0
10.65k stars 647 forks source link

ebiten: add more attributes to `Vertex` #2640

Open SolarLune opened 1 year ago

SolarLune commented 1 year ago

Operating System

What feature would you like to be added?

It would be nice if ebiten.Vertex had additional customizable values to use, particularly for usage in fragment shaders. Ideally, the fragment shader would be updated to get a linear blending of custom vertex attribute values at the relevant point between two or more vertices (i.e. varying in GLSL-speak), much like how it currently works for UV values.

Googling around, it seems like modern OpenGL generally guarantees at least 16 sets of 4-size vectors for custom attributes.

https://www.khronos.org/opengl/wiki/Vertex_Shader#Inputs

Why is this needed?

It would be useful for a variety of purposes. For Tetra3D. even just one customizeable vertex attribute might allow me to implement perspective-correct texture mapping, where I can map textures properly depending on the depth of each vertex, rather than leaving the texture mapping affine (where textures distort heavily the more they stretch away from the game view). If I had enough attributes to work with, then I might be even able to pass the transformation matrices for all vertex transformations to the GPU, offloading some of the render work.

As for other use-cases, I'd imagine that any situation where one set of triangles differs dramatically from the rest would be useful - for example, if a paper doll / rigged character walked through a liquid and you wanted it to stain the character's limb accordingly (and not just any / all vertices undernath a certain Y value threshold), you could use a customizeable vertex attribute for this purpose.

In this Godot proposal issue, the original poster posts a couple of examples of shaders that could make use of vertex attributes. These shaders are primarily focused on antialiased rendering of shapes (lines or markers).

This Godot proposal issue talks about using vertex attributes to control blending between tiles, or coloring specific colors for tiles in a top-down game, which is an interesting application. I could see the utility of this if blending or coloring is done dynamically at run-time.

hajimehoshi commented 1 year ago

I don't fully understand your case, but I would like to know whether it would be enough to add 4 more pareters to the current Vertex:

type Vertex struct {
    SrcX float32
    SrcY float32
    SrcZ float32 // new
    SrcWMinus1 float32 // new
    DstX float32
    DstY float32
    DstZ float32 // new
    DstWMinus1 float32 // new
    ColorR float32
    ColorG float32
    ColorB float32
    ColorA float32
}
Zyko0 commented 1 year ago

I'm allowing myself to join the discussion as I'm also interested in this feature. Usecases vary for my projects, but since we do not have access to vertex shaders and that uniforms re-upload is not always an option when we want to batch a lot of triangles, attributes appear as cheap uniforms that still allow batching.

This is my main usecase, and I use the ones available heavily already (e.g color) but to represent some extra information, I often wished I could have more attributes available (still the case for my current project).

I don't fully understand your case, but I would like to know whether it would be enough to add 4 more pareters to the current Vertex

This sounds very nice to me if you can make sense of it in terms of public API, but obviously the more I could access "per-vertex", the better. Is it possible to consider some "optional" / "extra" vertices that won't be confusing to ebitengine users as they don't have to filled, or do not need a "proper" name?

Curious about @SolarLune inputs on this too

SolarLune commented 1 year ago

I don't fully understand your case,

Affine_texture_mapping_tri_vs_quad

Essentially, when rendering a triangle for Tetra3D, I transform the positions for the vertices into 3D. This works, but I don't change the UV values, which gives affine texture mapping (at the left in the image above). If I want perspective-corrected texture mapping (at the right), I believe I need to add the depth of each vertex as an attribute. Then in the fragment shader, I can use the depth value to alter the UV sampling to account for perspective. The image above is taken from the Wikipedia article on texture mapping.

it would be enough to add 4 more pareters to the current Vertex:

Yeah, adding 4 would be fine for one use-case of mine with Tetra3D, though it's not enough for the other (transforming a vertex using a 4x4 matrix on the GPU). That's OK, as it's just something I mentioned as a possibility, but it would be nice to have a few more (16, maybe?), if every shader backend supported that many. Otherwise, 4 is fine.

If you're worried about performance, you could make it a fixed array - that way, if it's nil, it hasn't been set by the user, and so you don't have to upload those attributes to the GPU.

Is it possible to consider some "optional" / "extra" vertices that won't be confusing to ebitengine users as they don't have to filled, or do not need a "proper" name?

If you mean attributes, yeah, I would just name them Custom0, Custom1, etc., or something like that. If they don't do anything in Ebitengine, then the naming convention could be simple and clear to show they don't need to be set.

tinne26 commented 1 year ago

A couple initial comments:

tinne26 commented 1 year ago

Now let's put all this in context: vertex attributes are a generalized form of what we are already doing with the ColorR, ColorG, ColorB fields. So there's an underlying question here which is: do we want to generalize or not?

Hello everyone it's reflection time: the more we explore shaders, the more we realize they can't be simplified as much as we would like to without losing a lot of power in the process. Trying to make them accessible through nice APIs is not super effective, and in the end we find ourselves in the middle of nowhere, serving neither simplicity nor non-toy usages. I'd be happy to see the APIs move towards more general usage, so I like the idea of more exposed vertex attributes, but I think it needs to be done in a more structured way... which I think doesn't fit well with what we have on v2.

So, if we don't want to generalize, adding Attributes []float just doesn't blend well with what we have (but then nothing really blends well, honestly). If we want to generalize, then in the long run it seems like:

hajimehoshi commented 1 year ago

ColorR, ColorG, ColorB, ColorA would be superfluous. And yeah, they are already ignored with DrawRectShader.

They are not ignored and are passed at Fragment, right?

tinne26 commented 1 year ago

Ok, sorry, I wasn't aware of the addition of ColorScale on v2.5, my knowledge was stuck on v2.4.

tinne26 commented 1 year ago

I was talking about DrawRectShader, not DrawTrianglesShader.

hajimehoshi commented 1 year ago

In v2.4 and older, if a ColorM is a diagonal matrix, the scale part is reflected to the thrid argument of Fragment. This is a little tricky (and that's why I have separated the color matrix part into another package), but they are not ignored anyway.

EDIT: Forget this. ColorM is not available at DrawRectShader. Never mind!

SolarLune commented 1 year ago

Unless Ebitengine officially wants to support 3D, I think names like Z and W are better avoided. I also don't think Custom is particularly nice either. I think just calling it AttribN or Attributes []type is more suitable. They are attributes, so let's just call them that?

Yeah, Attrib or Attributes or something similar is perfectly fine. I don't think it's necessary to add Z / W for no real reason.

There's no proposal on how to deal with this from the kage side. With many attributes, the current model of vertexAttribN (mimicking existing built-in functions) wouldn't scale nicely. Maybe it can be vertexAttrib(N) or something, but we probably need a more concrete idea of how this would work. They could even be like uniforms, who knows.

I just kinda glossed over it, as I wasn't really making a full proposal so much as just starting an issue to discuss and track this concept, but you're right, I didn't consider it fully.

I defaulted to floats because that's what currently available attributes are, and I defaulted to smooth because that's what I'd imagine using for my use case. However, if we were going to add, say, flat attributes (where a fragment's reading of a vertex attribute basically uses the nearest vertex's value, rather than linearly interpolating across the nearby vertices, if I'm correct) in addition to smooth attributes, then it would need to be accessed in a Kage program as a function. Otherwise, it would have to be basically tacked onto the variable declaration, which doesn't feel right:


var UniformTest
var VertexAttribute0 flat // Odd...?

func Fragment(...) {
    ....
}

Instead, something like this feels better:

attrA := VertexAttribute(0, VertexAttributeFlat) // Returns custom attribute 0 in flat mode
attrB := VertexAttribute(1, VertexAttributeSmooth) // Returns custom attribute 1 in smooth mode

I think vertex attributes in GLSL basically are all floats (either vectors of 4 floats, or individual float values), so for simplicity it seems like it would be easier to just have floats for any and every attribute...?

So, if we don't want to generalize, adding Attributes []float just doesn't blend well with what we have (but then nothing really blends well, honestly). If we want to generalize, then in the long run it seems like:

Memory layouts will become a bigger deal, and maybe passing the attributes as a separate argument that can be copied more directly makes more sense. Also, having slices of Attributes in Vertex with potentially different lengths sounds like a nightmare.

Would it be better to have an array of attributes instead of a slice? That way all vertices either have the current vertex attribute layout ordering, or this theoretical "full" one with the custom vertex attribute set being allocated in memory, regardless of how many of the available attributes you're using (the others would just be 0).

Alternatively, should ebiten.Vertex{} essentially just be a map of attribute names to values? So there is no ebiten.Vertex.ColorR, there's just ebiten.Vertex.Attributes["ColorR"], where Attributes is a map of strings / identifiers to floats (something like the current uniform system)? The names could point to what the attribute represents (so ColorR would be expected in a fragment shader as being the red channel of the vertex's color)? This could work - for optimization, it would be better if it was just an array of, say, 12+ floats, in order - X, Y, U, V, R, G, B, A, Attribute 0, 1, 2, 3, etc.?

But I don't really see how this is any different or better than the current idea of just tacking on custom attributes to the ebiten.Vertex{} struct. Is this even a bad idea?

tinne26 commented 1 year ago

Yeah, as I said, I think defaulting to smooth and floats is the most sensible choice, just wanted to mention it in case we realized there was some important use-case we missed that could be useful enough to make us reconsider.

On the topic of memory layout and performance, I was mentioning it based on my (limited) knowledge of openGL, where you basically send the data to GPU like this:

(float32)[v1.X, v1.Y, v1.Z, v1.W, v2.X, v2.Y, v2.Z, v2.W, ..., v1-attrib1, v1-attrib2, v2-attrib1, v2-attrib2, ...]

So, if attributes are passed directly to a function like DrawTrianglesWithAttribs(vertices, indices, attribs, numAttribsPerVertex), this could be much more efficient than copying the data from each vertex to the buffer sent to GPU. Of course, the positions still have to be copied and I haven't checked if Metal and DirectX also work like this, but this is mostly what I was talking about. Indeed, these considerations are probably premature.

In any case, I think the main question remains the direction to take, whether we are interested in generalization or not, as this is what would determine the API. If we don't want generalization, a few extra static fields in Vertex may be fine. If we want generalization, a whole another approach is required, Color* fields are superseded, etc.

tinne26 commented 1 year ago

Since learning more about kage internals from the previous issue, I'm starting to think that vertex attributes are actually a big deal. I wasn't super invested on it earlier because I thought most use-cases came up on 3D, but now that I understand better how uniforms break batching, I see that vertex attributes are extremely useful even in (and maybe specially in) shaders that operate with simple geometry. Basically, for shaders that operate on a single quad, passing information x4 is not a big deal, and in fact, it would be extremely useful to have flat attributes. The key idea for Ebitengine is to operate with quads and rectangles. Those benefit most from the combination of flat and vertex attributes to avoid breaking batching.

Later we may see that if we change the number of vertex attributes we will most likely break batching anyway, but... there's much more freedom to operate, I guess. Worth highlighting this point for uses outside 3D.

hajimehoshi commented 1 year ago

I think I'd like to go with https://github.com/hajimehoshi/ebiten/issues/2640#issuecomment-1506522188, and revisit more flexible vertices later.

hajimehoshi commented 1 year ago

Yeah, adding 4 would be fine for one use-case of mine with Tetra3D, though it's not enough for the other (transforming a vertex using a 4x4 matrix on the GPU). That's OK, as it's just something I mentioned as a possibility, but it would be nice to have a few more (16, maybe?), if every shader backend supported that many. Otherwise, 4 is fine.

Probably

type Vertex struct {
    SrcX float32
    SrcY float32
    SrcZ float32 // new
    SrcWMinus1 float32 // new
    DstX float32
    DstY float32
    DstZ float32 // new
    DstWMinus1 float32 // new
    ColorR float32
    ColorG float32
    ColorB float32
    ColorA float32
    Custom0 float32 // new
    Custom1 float32 // new
    Custom2 float32 // new
    Custom3 float32 // new
}

should cover most use cases. This might affect performance.

EDIT:

it would be nice to have a few more (16, maybe?)

Oh, if this means 16 'more', the above struct is not enough...

SolarLune commented 1 year ago

Oh, I meant 16 maximum, overall, not adding 16 more. Your proposed Vertex set would be fine.

If I could nitpick, I don't think you need to name two of them SrcZ / SrcWMinus1 and DstZ / DstWMinus1, as Ebitengine doesn't do anything with those when rendering (right?), so you could just name them after Custom as well.

tinne26 commented 1 year ago

Would it be possible to use this instead?

type Vertex struct {
    SrcX float32
    SrcY float32
    DstX float32
    DstY float32

    ColorR float32
    ColorG float32
    ColorB float32
    ColorA float32

    Attributes [8]float32
}
hajimehoshi commented 1 year ago

Hm, this sounds better

SolarLune commented 9 months ago

Hello - commenting to post some mini-findings.


In my pursuit of perspective corrected texture mapping for Tetra3D (see this comment in this issue), I tried encoding a depth-related vertex value (specifically, the 1/W component of 3D vertices post-transformation) in the alpha component of vertices' vertex colors.

To pack this value, I multiply the original alpha value by 256 (as it's the maximum number for visually-representable color values, so no information should be lost) on CPU when packing the data and creating the vertex color values, and then divide that value in Kage by 256 when unpacking to restore the original alpha values. This leaves me with the fractional part of the alpha channel to encode by W component. I multiply the W component by a small number, like 0.05, to fit in the fractional portion of the alpha color channel, regardless of the original range of the W component.

Screenshot from 2023-11-06 20-21-54

While this does work pretty well, it unfortunately adds "fuzzy" edges to the textures, most likely because of floating point imprecision when unpacking the W component from the alpha channel. You can see this fuzziness along the vertical lines on the wooden texture in the screenshot above.

Screenshot from 2023-11-06 20-21-43

It also doesn't work at all if the scaled depth value exceeds the 0-1 range (i.e. if the camera is very close to an object). This creates an obvious UV issue, as seen in the picture above.

The main point of this post is just to post my experience in working on a workaround to having access to additional vertex attributes, and the hope that this shows why custom vertex attributes may be useful for shaders.

Are custom vertex attributes still feasible for Ebitengine v2.7.0, @hajimehoshi?

hajimehoshi commented 9 months ago

I was wondering how custom vertex attributes would resolve the fuzzy-edge issue.

SolarLune commented 9 months ago

I was wondering how custom vertex attributes would resolve the fuzzy-edge issue.

Sorry, I don't think I explained everything properly, but if I had custom vertex attributes, then I wouldn't need to encode the W component into the alpha channel of vertices and could just send it straight; I would have a full float32's range to work with. That would reduce the fuzziness because there wouldn't be any (or at least, the normal negligible amount of) floating point error.

It's possible my attempt at this workaround of encoding the information I need in the shader isn't correct, and that I could get a better result with a different approach, but I feel like this is where custom vertex attributes would help.

hajimehoshi commented 9 months ago

If I had custom vertex attributes, then I wouldn't need to encode the W component into the alpha channel and could just send it straight; I would have a full float32's range to work with

You can use full float32's range for color components of Vertex, right? There is nothing to prevent this.

SolarLune commented 9 months ago

You can use full float32's range for color components of Vertex, right? There is nothing to prevent this.

I'm already using the alpha channel as the actual alpha channel of vertices, haha. That's why I have to encode the data - I still need per-vertex alpha transparency, but also need another per-vertex value.

hajimehoshi commented 9 months ago

I'm confused. Which alpha values or W values did you use ColorA for?

SolarLune commented 9 months ago

I'm confused. Which alpha values or W values did you use ColorA for?

Currently, I use the ColorA component of vertices to handle alpha transparency, but I also need to send an additional float32 value, the W component of 3D transformed vertices, to resolve perspective corrected texture mapping. I can encode both values (the vertices' alpha values and the W component) into a single float32, which is stored in the vertices' ColorA component, as a workaround. That's what my comment here was about.

hajimehoshi commented 9 months ago

I see, so you packed an alpha value and a W value into one ColorA with some hacks, and this degrades some precision, right?

SolarLune commented 9 months ago

I see, so you packed an alpha value and a W value into one ColorA with some hack, and this degrades some precision, right?

Basically, yeah, haha.

hajimehoshi commented 9 months ago

OK I understood.

Are custom vertex attributes still feasible for Ebitengine v2.7.0, @hajimehoshi?

Yes so far, but I'm afraid I cannot 100% guarantee this.

SolarLune commented 9 months ago

Yes so far, but I'm afraid I cannot 100% guarantee this.

OK, got it - is there an estimate on when 2.7 might come? Like, is by the end of the year possible?

hajimehoshi commented 9 months ago

New minor versions of Ebitengine are usually released one month after new minor versions of Go are released. So probably next March.

SolarLune commented 9 months ago

New minor versions of Ebitengine are usually released one month after new minor versions of Go are released. So probably next March.

Understood - I'll probably encode the data in a different way until then. Thanks!

Zyko0 commented 9 months ago

So probably next March.

Okay good to know on my end too, I'm having quite a big need for this (not to say critical 👀)

hajimehoshi commented 1 month ago

I'm revisiting this issue. I think I'll go with adding 8 new attributes like this:

type Vertex struct {
    SrcX float32
    SrcY float32
    DstX float32
    DstY float32

    ColorR float32
    ColorG float32
    ColorB float32
    ColorA float32

    Custom0 float32 // Do we have a better name?
    Custom1 float32
    Custom2 float32
    Custom3 float32
    Custom4 float32
    Custom5 float32
    Custom6 float32
    Custom7 float32
}

Also, this requires to add a new version of the Fragment function signature:

func Fragment(dstPos vec4, srcPos vec2, color vec4, customs [N]float) // N is from 0 to 8

// Should these also be accepted?
func Fragment(dstPos vec4, srcPos vec2, color vec4, custom0 float, custom1 float)
func Fragment(dstPos vec4, srcPos vec2, color vec4, custom vec4)

// Of course, the current Fragment should also work.
func Fragment(dstPos vec4, srcPos vec2, color vec4)

I don't plan to introduce a vertex shader, so the custom values are passed without modification except for the usual linear interpolation. @SolarLune Would this be fine to you?

My current question is, do we really need 8 custom attributes, or would 4 be enough?

tinne26 commented 1 month ago

Regarding 4 or 8, SolarLune already said this:

Yeah, adding 4 would be fine for one use-case of mine with Tetra3D, though it's not enough for the other (transforming a vertex using a 4x4 matrix on the GPU). That's OK, as it's just something I mentioned as a possibility, but it would be nice to have a few more (16, maybe?), if every shader backend supported that many. Otherwise, 4 is fine.

In my opinion, the main factor is whether this affects all rendering pipelines, or only those that actually use the attributes (I'm expecting the first). If now all draws will have the attribute overhead, maybe we want to be more conservative and have only 4. I guess Zyko and other people doing more advanced shader stuff would be able to say how many attributes would be useful in more specific cases. I'd lean towards 8 myself, but we would have to check how much overhead there is. 16 for the matrix transformation would probably be too much if this affects all operations.

Also, any specific reason to prefer CustomN instead of the suggested [N]Attributes (https://github.com/hajimehoshi/ebiten/issues/2640#issuecomment-1635604181)?

hajimehoshi commented 1 month ago

I'm not sure how I should take a performance difference. Maybe running examples/sprites with 4k sprites on various platforms would be a kind of a good 'torture' test.

Also, any specific reason to prefer CustomN instead of the suggested [N]Attributes (https://github.com/hajimehoshi/ebiten/issues/2640#issuecomment-1635604181)?

No specific reason. I thought using float32 members seemd more consistent with the other existing members.

Zyko0 commented 1 month ago

My current question is, do we really need 8 custom attributes, or would 4 be enough?

4 would obviously help a lot already (and solve some of my current tricks), but 8 would allow just even more, so I'd prefer 8 personally. Doing it in multiple steps: 4 then 6 then 8 progressively by tracking a new issue and assess if there's demand for it / that the previous iteration didn't degrade performances after some time, could be a good way to handle this imo!

Also, this requires to add a new version of the Fragment function signature:

func Fragment(dstPos vec4, srcPos vec2, color vec4, customs [N]float) // N is from 0 to 8

I prefer this one! It might be possible to expose a new public function as well in order not to bloat the function's arguments maybe? (by setting a varying from the vertex shader or something?)

SolarLune commented 1 month ago

Also, this requires to add a new version of the Fragment function signature:

func Fragment(dstPos vec4, srcPos vec2, color vec4, customs [N]float) // N is from 0 to 8

// Should these also be accepted?
func Fragment(dstPos vec4, srcPos vec2, color vec4, custom0 float, custom1 float)
func Fragment(dstPos vec4, srcPos vec2, color vec4, custom vec4)

// Of course, the current Fragment should also work.
func Fragment(dstPos vec4, srcPos vec2, color vec4)

I don't plan to introduce a vertex shader, so the custom values are passed without modification except for the usual linear interpolation. @SolarLune Would this be fine to you?

My current question is, do we really need 8 custom attributes, or would 4 be enough?

Exposing it as an argument to Fragment is fine, though I wonder how complicated this might become if we keep adding arguments to Fragment(). I feel like a function to retrieve vertex attributes within the shader would be a bit simpler and more future-proof. I guess putting it in the fragment arguments is consistent with the other vertex-interpolated attributes, so I don't mind either way.

[crazy]

Maybe the true answer is to remove everything but the destination position from Fragment()'s arguments and make all vertex attributes retrievable using functions:


func Fragment(dstPos vec2) {

    color := VertexAttribute(VERTEX_ATTRIBUTE_COLOR) // vec2
    srcXY := VertexAttribute(VERTEX_ATTRIBUTE_UV) // vec4
    customAttributes := VertexAttrribute(VERTEX_ATTRIBUTE_CUSTOM) // [4/8/whatever]float

}

This would be consistent and future-proof, but I understand if it's outside the scale of what we're looking at here.

[/crazy]


Any additional number of attributes would be fine, but the more the better, in my opinion.

According to vanilla OpenGL 4, if the minimum amount maximally available is informed by GL_MAX_VERTEX_ATTRIBS, the max is at least 16. So, we can add 8 more max (since we're already using 8), maybe? See this page (you can Ctrl+F to search for the GL constants):

https://registry.khronos.org/OpenGL-Refpages/gl4/html/glGet.xhtml

It might be good to not expose all possible attributes for the user and reserve some for Ebitengine, in case it needs more vertex attributes in the future, so that's something to keep in mind. So I'm with Zyko0 - the more the better, 8 ideally, but 4 is fine.

In my opinion, the main factor is whether this affects all rendering pipelines, or only those that actually use the attributes (I'm expecting the first). If now all draws will have the attribute overhead, maybe we want to be more conservative and have only 4. I guess Zyko and other people doing more advanced shader stuff would be able to say how many attributes would be useful in more specific cases. I'd lean towards 8 myself, but we would have to check how much overhead there is. 16 for the matrix transformation would probably be too much if this affects all operations.

I'd imagine the custom attributes wouldn't be uploaded if they aren't set, so there should be little performance overhead unless the user actually uses the attributes... Hopefully.

No specific reason. I thought using float32 members seemd more consistent with the other existing members.

I feel like the array version would be a bit better, since that simplifies binding and interpolating the attributes. It also would allow you to internally check if the attributes are being used simply by just checking the length of the array.

hajimehoshi commented 1 month ago

IIUC, Metal should take arguments for attribute values, so unfortunately we cannot create global functions for attributes. I'll stick to the current idea: adding optional arguments to Fragment.

I'd imagine the custom attributes wouldn't be uploaded if they aren't set, so there should be little performance overhead unless the user actually uses the attributes... Hopefully.

There is no way to specify whether the additional attiributes are used in the current proposal. In order to simplify the implementation, I don't think we need it as long as the performance doesn't change so much.

I feel like the array version would be a bit better, since that simplifies binding and interpolating the attributes. It also would allow you to internally check if the attributes are being used simply by just checking the length of the array.

Perhaps you mean a slice, not an array? Slices would use additional heap allocation so I don't think we could use slices for this purpose.

hajimehoshi commented 1 week ago

I applied this change onto b6ab7a10c1f57218499ced930a881ef1738af946:

diff --git a/internal/graphics/shader.go b/internal/graphics/shader.go
index 8553cb313..89dd6c3ac 100644
--- a/internal/graphics/shader.go
+++ b/internal/graphics/shader.go
@@ -154,7 +154,7 @@ func imageSrc%[1]dAt(pos vec2) vec4 {
    shaderSuffix += `
 var __projectionMatrix mat4

-func __vertex(dstPos vec2, srcPos vec2, color vec4) (vec4, vec2, vec4) {
+func __vertex(dstPos vec2, srcPos vec2, color vec4, unused vec4) (vec4, vec2, vec4) {
    return __projectionMatrix * vec4(dstPos, 0, 1), srcPos, color
 }
 `
diff --git a/internal/graphics/vertex.go b/internal/graphics/vertex.go
index 95d7768b1..6d59f31c8 100644
--- a/internal/graphics/vertex.go
+++ b/internal/graphics/vertex.go
@@ -39,7 +39,7 @@ const (
 )

 const (
-   VertexFloatCount = 8
+   VertexFloatCount = 12
 )

 var (
diff --git a/internal/graphicsdriver/opengl/program.go b/internal/graphicsdriver/opengl/program.go
index 1a6aa05b0..d21d13f23 100644
--- a/internal/graphicsdriver/opengl/program.go
+++ b/internal/graphicsdriver/opengl/program.go
@@ -103,6 +103,10 @@ var theArrayBufferLayout = arrayBufferLayout{
            name: "A2",
            num:  4,
        },
+       {
+           name: "A3",
+           num:  4,
+       },
    },
 }

and check the performance difference:

goos: darwin
goarch: arm64
pkg: github.com/hajimehoshi/ebiten/v2
                 │    old.txt    │             new.txt             │
                 │    sec/op     │    sec/op      vs base          │
DrawTriangles-12   582.5n ± 295%   955.4n ± 135%  ~ (p=0.165 n=10)

old.txt

goos: darwin
goarch: arm64
pkg: github.com/hajimehoshi/ebiten/v2
BenchmarkDrawTriangles-12        3768846           301.1 ns/op
BenchmarkDrawTriangles-12        2488868           419.8 ns/op
BenchmarkDrawTriangles-12        4212445           595.3 ns/op
BenchmarkDrawTriangles-12        4043070           569.6 ns/op
BenchmarkDrawTriangles-12        2561703           728.4 ns/op
BenchmarkDrawTriangles-12        4143831           698.5 ns/op
BenchmarkDrawTriangles-12        3922063           337.7 ns/op
BenchmarkDrawTriangles-12        1000000          2301 ns/op
BenchmarkDrawTriangles-12        4032066           262.3 ns/op
BenchmarkDrawTriangles-12        4596901          2334 ns/op
PASS

new.txt

goos: darwin
goarch: arm64
pkg: github.com/hajimehoshi/ebiten/v2
BenchmarkDrawTriangles-12        4167512           437.1 ns/op
BenchmarkDrawTriangles-12        3781263           565.6 ns/op
BenchmarkDrawTriangles-12        1000000          1174 ns/op
BenchmarkDrawTriangles-12        1746486           928.0 ns/op
BenchmarkDrawTriangles-12        3892663           699.6 ns/op
BenchmarkDrawTriangles-12        1000000          2246 ns/op
BenchmarkDrawTriangles-12        3699375           511.4 ns/op
BenchmarkDrawTriangles-12        1000000          2595 ns/op
BenchmarkDrawTriangles-12        1000000          1203 ns/op
BenchmarkDrawTriangles-12        4047705           982.8 ns/op
PASS

So there is not a significant performance difference so far just by adding 4 floats for each vertex.