Closed darksylinc closed 3 months ago
Hi, I would like to give my feedback as well, why separation of depth stencil state? This would work on metal but d3d12/vulkan wants all states for graphics pipeline creation, otherwise you'll need to cache - create pipelines on demand which can result on gpu slutter, so probably SDL_GpuPipelineDescription needs all members necessary to create vulkan/d3d12 pipeline.
Another topic I would like to bring attention is: Memory allocation, what strategy would be used? Or existing D3D12MA/VMA?
why separation of depth stencil state? This would work on metal
Metal was my starting point for designing this, specifically because I think it's the most app-friendly (at least in the API concepts, if we're willing to avoid debates about Metal Shader Language and Objective-C). But as @darksylinc's detailed notes also point out, there are several small things in Metal that are going to cause headaches elsewhere, and this is probably also a good example of that.
Some of these might be worth changing, some might be worth faking on this platform or that, and some are worth putting in the trash. Definitely point these things out and we'll make changes to the API as we go. We are in extremely early times for this project.
Memory allocation, what strategy would be used? Or existing D3D12MA/VMA?
Probably something like VMA behind the scenes, but my sincere hope is to hide this from the app if I can get away with it.
Unsupported vertex formats
Removed! This remaining list honestly still feels like it could lose a bunch more formats and no one would miss them, too.
The trend is to unify these two; since supported GpuVertexFormat is just a subset of GpuPixelFormat
But these aren't the same thing at all...? Maybe I'm misunderstanding, but if/when we add compressed texture formats, we're definitely going to need these to be separate, as they won't just be packed arrays of raw pixels anymore.
(I'll go through the rest of your comments in the morning, just didn't want you to think I was ignoring you!)
But these aren't the same thing at all...? Maybe I'm misunderstanding, but if/when we add compressed texture formats, we're definitely going to need these to be separate, as they won't just be packed arrays of raw pixels anymore.
I've been giving more thought about it and I agree it makes sense to keep them separate. That makes it less error prone, otherwise you can feed a GpuPixelFormat
that is not a valid GpuVertexFormat
.
From a low level perspective it makes sense, since like I said GpuVertexFormat
is a subset of GpuPixelFormat
thus it simplifies code (e.g. same conversion routines that work for RGBA8_UNORM work for UCHAR4_UNORM) and allows for potential expansion in the future (i.e. this has never happened though, like supporting RGBA565 or ASTC as a vertex format).
(I'll go through the rest of your comments in the morning, just didn't want you to think I was ignoring you!)
No prob and thanks for being considerate! :)
To revisit:
why separation of depth stencil state? This would work on metal
It looks like primitive topology needs to be specified in the PSO too in D3D12...my misreading of ID3D12GraphicsCommandList::IASetPrimitiveTopology made me think you could set whatever in the command buffer, but the PSO still wants to know if it's generally a point/line/triangle. :/
So we'll probably move SDL_GpuPrimitive into SDL_GpuPipelineDescription along with the depth stencil state, too.
- Generate tmp texture
- Load data into tmp texture
- Generate mipmaps into tmp
- Copy results into final texture
- Destroy tmp (or keep it cached for reuse)
I'm thinking we can do this in SDL_GpuGenerateMipmaps(), with the backend deciding whether a texture is ideal (it doesn't have SHADERWRITE on d3d12, etc, so it does the temporary texture proxy, otherwise it just does it directly on the existing texture)...it won't need a new SDL_GPUTEXUSAGE_MIPMAPGEN
flag at all in this case, as SDL will handle this transparently.
If doing this transparently takes a performance hit, the primary use case would be loading textures without mipmaps from disk, which is generally considered a slow operation (and for large enough games, is probably tossed into a background thread).
But I might be oversimplifying things here...?
If doing this transparently takes a performance hit, the primary use case would be loading textures without mipmaps from disk, which is generally considered a slow operation (and for large enough games, is probably tossed into a background thread).
But I might be oversimplifying things here...?
No, that's a bad idea.
If I have a RENDERTARGET
texture, and want to call SDL_GpuGenerateMipmaps every frame, on D3D11 it will work as expected. But on D3D12 it will use the tmp path every frame because it's missing SHADERWRITE
,
And even if you keep the MIPMAPGEN
flag, you have to consider the issue of silent user error: Users who will call SDL_GpuGenerateMipmaps every frame without the MIPMAPGEN
flag would end up using the slow tmp copy path.
And users who who actually want to use the tmp copy path, may end up adding the MIPMAPGEN
flag because they enabled it on purpose thinking that it is necessary when it's not.
To fix that silent error the API needs to be more explicit, like SDL_GpuGenerateMipmaps( bool bUserTmpCopyPath ). And the API will give a perf warning if bUserTmpCopyPath == true and MIPMAPGEN
is present, and give an error if bUserTmpCopyPath == false and MIPMAPGEN
is absent.
Or instead of adding bool bUserTmpCopyPath
a new flag MIPMAPGEN_ON_LOAD
(or something like that) is added. The flag indicates SDL_GpuGenerateMipmaps can be called on it, but the copy path should be used.
Update: This is the problem OpenGL has. GL lets you call glGenerateMipmaps on any texture at any time.
Very convenient.
But doing so means the driver must internally retransform the texture to add the RENDERTARGET
flag (this is has been the cause of lots of driver bugs in the past) and once it's added it stays forever, which means you pay a hidden price just for calling the function; and only experts will tell you about this pitfall.
If I have a RENDERTARGET texture, and want to call SDL_GpuGenerateMipmaps every frame, on D3D11 it will work as expected. But on D3D12 it will use the tmp path every frame because it's missing SHADERWRITE,
oh, I see. Okay, let me think about this more.
OpenGL
These are all good points about OpenGL, btw, and it will likely work much like you described...but also I'm trying to decide what the surface area looks like for "people that have a modern version of OpenGL and don't have any other options," and I suspect it's pretty small, outside of WebGL in current times and some smaller devices that expect you to use GLES3...which is not to minimize it, but focusing on Vulkan, Direct3D and Metal backends first and then making GL4/GLES3/WebGL2 limp along later seems like a reasonable strategy, at least on paper.
SDL_GpuPipelineDescription contains SDL_GpuColorAttachmentDescription, which contains a SDL_GpuTexture.
It doesn't, I have two unrelated structs with the same name, which is obviously a different problem. :)
Latest revision moves depth stencil details into SDL_GpuPipelineDescription and fixes SDL_GpuColorAttachmentDescription naming.
what the surface area looks like for "people that have a modern version of OpenGL and don't have any other options,"
Do you mean as in "My HW or drivers are too old to run anything better" ?
If that's the case then:
As for the market share in general, I wouldn't dismiss it. It won't and cannot run the latest AAA games, these machines don't appear in Steam survey because it's not their market.
The people who use these machines are usually kids who play the least demanding games they can run (Minecraft, Roblox, Dota, LoL, WoW, CS:GO, Team Fortress, Overwatch, Fortnite) on their parent's "office" computer.
Fortunately the landscape has improve with i3 CPUs becoming more common in offices, which are bundled with a reasonable Intel iGPU (in terms of API/driver support).
@darksylinc what about d3d12? For my game engine I went fully bindless with vulkan and d3d12, would vulkan 1.2 as min spec work?
what about d3d12? For my game engine I went fully bindless with vulkan and d3d12, would vulkan 1.2 as min spec work?
What about it? I don't understand the question, and feels a little off topic too.
Is d3d12 viable as better choise over d3d11? And which vulkan min spec is planned to be supported here? Vulkan 1.1, 1.2 or 1.3?
Is d3d12 viable as better choise over d3d11? And which vulkan min spec is planned to be supported here? Vulkan 1.1, 1.2 or 1.3?
I'm starting with D3D12, but my expectation is that I will put a D3D11 backend in as well. I almost certainly won't go lower than that; D3D11 was available starting with Windows 7 (and apparently backported to Windows Vista!), and that feels like more than enough legacy support.
I don't think we'll need anything beyond Vulkan 1.0, at least at first, as far as I can tell, but there's a lot of small details I expect to stumble over as I go, so we'll see what pops up that needs 1.1 or whatnot.
Anyhow, the initial goals are D3D12, Metal, and Vulkan, and then things like GL4/GLES3/WebGPU can be considered (and possibly abandoned).
Pushed changes that resolve most of the discussed things; I think the only thing still unresolved is what to do about generating mipmaps, which I'm still considering.
OpenGL does not support samplers being separate from textures at the shader level
Sorting through this feedback again, and I'm just making sure I'm clear on this part: Is this problem not solved by GL_ARB_sampler_objects?
It's not. That partially solves the problem on the C++ side, but not on the shader side
Ah, I see...HLSL lets you do myTexture.Sample(mySampler, texcoords)
but GLSL only offers texture(myTextureUnit, texcoords)
. :(
Opinions welcome on the current plans for vertex shader syntax: icculus/SDL_shader_tools#3
Re: depth buffer formats
Just FYI: Depth24_Stencil8 isn't available in Metal on iOS (it is on macOS, though). Both platforms do support Depth16, Depth32Float, Stencil8, and the convenience format Depth32_Stencil8.
Maybe SDL GPU should add Depth32Float and Depth32Float_Stencil8, assuming those are also supported on DX12 and Vulkan..
(Just bumping this comment over here to have discussion stay in one place. --ryan.)
I'm not a huge fan of manually exposing Fences as a client-side resource. Since their intended purpose is to query command buffer status and wait on command buffer completion, I think these should be command buffer APIs (
SDL_QueryCommandBufferStatus
/SDL_WaitForCommandBuffer
), with the various backends handling the fences themselves.Making fence submission optional on the client side also has the side effect of making a Vulkan implementation significantly more complicated, since in most Vulkan renderers, command buffers always need a fence to manage and clean up internal state. This means that the backend would need to manage fences internally anyway, and since Vulkan only allows you to submit one fence per command buffer submission, internal cleanup and user-side notifications would have to be conflated. This seems unnecessarily complicated.
Originally posted by @TheSpydog in https://github.com/libsdl-org/SDL/issues/7067#issuecomment-1382251359
I had to dig through my source code to see how it's handled and realized TheSpydog is right, but with some remarks to do.
You only get one VkFence per command buffer. And Metal works in a similar way too (they just call it a semaphore, which has nothing to do with VkSemaphore).
But I would suggest a slight separation, having fences still exposed to user:
SDL_Fence fence = SDL_GetCommandBufferFence( cmdBuffer );
SDL_WaitForFence( fence );
That is, still expose fences to users, but make it explicit that fences come from Cmd Buffers (instead of the user being able to create them on a whim at random).
It also makes explicit that waiting for a fence requires flushing a cmd buffer, which in certain cases could have nasty side effects. If you fully separate fences from cmd buffers, then hiding this fact from users will leave in a gigantic world of pain.
It's a matter of separation of concepts. A 'fence' is something we are waiting for. A 'command buffer' is a huge chunk of work, potentially consuming a lot of memory, we may not even care about.
In OgreNext we have the following pattern for data we must download GPU -> CPU when highly accurate fence tracking is needed:
vkCmdCopyImageToBuffer( mQueue->mCurrentCmdBuffer, srcTextureVk->getFinalTextureName(),
VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, dstBuffer, 1u, ®ion );
if( accurateTracking )
{
mAccurateFence = mQueue->acquireCurrentFence();
// Flush now for accuracy with downloads.
mQueue->commitAndNextCommandBuffer();
}
We don't really care about cmd buffers. We care that the vkCmdCopyImageToBuffer
is done.
In fact the cmd buffer may already start being recycled while the VkFence is still not recycled (once we're done with the fence, we also recycle it too)
Detaching Cmd buffers from fences allows recycling to be separate.
In OgreNext, when you need to download data from GPU to CPU, you could do 3 things:
Accurate fences are for dealing with scenario 2: the download is quite urgent, but we can do some work we have left rather than stalling.
The schemes TheSpydog or I propose assume there is one VkFence per Cmd Buffer, while Vulkan has no such requirement (you can submit a cmd buffer without any fence).
Well, you could either assume all cmd buffers need a VkFence (in OgreNext we do that, because rendering engines are complex and for various reasons sooner or later you always end up needed that fence), or just add an API e.g. SDL_SetHasFence( cmdBuffer, true )
EDIT: I suspect disabling a cmd buffer's fence will be a bad idea. In order to know if a cmd buffer is over (and thus recycle or destroy it), you need a fence. If you submit two cmd buffers in a row, CMD A is not guaranteed to finish before CMD B is finished, unless you've used a VkSemaphore to explicitly state that CMD B depends on A.
Thus if you submit cmd buffers A without fences and without semaphores, and then B with a fence, you have no way to query when is A over (unless you submit new cmd C that issues a full barrier, or a vkDeviceWaitIdle
).
But I would suggest a slight separation, having fences still exposed to user:
SDL_Fence fence = SDL_GetCommandBufferFence( cmdBuffer ); SDL_WaitForFence( fence );
The problem with this approach is that you are still exposing the user to fence management - now every time they submit a command buffer they will have to wait for the command buffer to finish and then free the fence that the submit command returns, or they will have a memory leak, and the Vulkan validation layer will complain loudly. It would be a major headache to do this without blocking the main thread.
Since waiting for a fence is essentially the same thing as waiting for the command buffer to finish working, I don't see any good reason to force the user to think about fences. I've designed two systems this way and it's never ever been a problem. Some console APIs don't even have fences!
While I'm here... I'm critical of referring to getting the presentation texture as GetBackbuffer
. This is maybe pedantic, but I think the term "backbuffer" carries an implication that the memory will be stable across frames, which is not true in basically any modern API.
What I've done in my own API is have a call AcquireSwapchainTexture
which takes a command buffer and returns the swapchain texture - if acquiring the swapchain texture is successful, the command buffer automatically sets up the presentation structure when the command buffer is submitted. If acquiring the swapchain texture was unsuccessful, the call returns NULL to let the user know. Doing it this way has some nice structural benefits for the Vulkan renderer at least, because we can automatically set up a semaphore to wait on the submitted command buffer being finished before presenting the swapchain image. If there's a problem with this approach, I haven't seen it yet.
Since waiting for a fence is essentially the same thing as waiting for the command buffer to finish working, I don't see any good reason to force the user to think about fences
No, it's not the same thing.
A command buffer can block literally hundreds of megabytes from being released. A fence does not.
I've run into this problem when building cubemap probes for reflections. A lot of temporary textures & buffers are created, and having a small array of 3x3 cubemaps with 3 split of PSSM means 6 faces x 9 cubemaps x 4 passes per face = 216 passes
.
That's a lot of drawcall information stored in the cmd buffer(s). But it's not just that, you can't release tmp textures & buffers either until the cmd buffers have been reset/destroyed. Everything depends on those cmd buffers.
Carefully & properly flushing & releasing command buffers is the difference between running out of memory and generating these probes successfully. And you can't do that if you need to wait on a fence and that forces blocking the cmd buffer from being released.
You are right that my example code was flawed though, as the correct code would be:
SDL_Fence fence = SDL_GetCommandBufferFenceAcquire( cmdBuffer ); // increases ref count on fence
SDL_WaitForFence( fence );
SDL_FenceRelease( fence ); // We are done with the fence, decrease ref count
Unfortunately, you can't hide the fences unless the use cases will be trivial.
What you can though, is design an API so that fences are acquired from a cmd buffer, instead of creating them arbitrarily.
I am extremely confused. You can't safely release data related to commands in a command buffer or the command buffer itself unless the command buffer has completed, and the typical way to know if a Vulkan command buffer has completed execution is to check if its associated submission fence is signaled. Am I missing something here? I know there are ways to do fine-grained command checks using vkEvent but that seems unrelated to what we are discussing here.
I am extremely confused. You can't safely release data related to commands in a command buffer or the command buffer itself unless the command buffer has completed, and the typical way to know if a Vulkan command buffer has completed execution is to check if its associated submission fence is signaled.
That is correct. But the reasons to wait on a fence are varied.
Yes, the primary reason to wait on a fence would be to know if its associated cmd buffer is done and can therefore be recycled. But a secondary reason to wait on a fence is because we want to know if a particular command (like a vkCmdCopy*) inside that command buffer has finished.
If that command has finish, it means it's safe to access the mapped region from CPU. Hence the user may want to hold on to a VkFence for longer than it wants to hold on to a VkCommandBuffer
Perhaps it may be possible to not expose an SDL_Fence if an SDL_CommandBuffer internally recycles its VkCommandBuffer (which means the same SDL_CommandBuffer could be internally associated with many VkCommandBuffer across the lifetime of an app).
When SDL determines a SDL_CommandBuffer is over, it recycles and unsets its internal VkCommandBuffer, but still keeps its VkFence alive for wait/querying purposes.
That would make debugging harder though, when things go wrong (e.g. a seemingly-impossible crash says the SDL_CommandBuffer ptr is the same, but the user fails to realize the internal VkCommandBuffer ptr went stale because this implementation detail is hidden from him)
If that command has finish, it means it's safe to access the mapped region from CPU. Hence the user may want to hold on to a VkFence for longer than it wants to hold on to a VkCommandBuffer
Why? What extra information does the fence contain?
Why? What extra information does the fence contain?
None.
It's simple derivation/deduction: If vkCmdCopyBufferToImage ABC
is inside vkCmdBuffer DEF
and VkFence X
says DEF
is finished, then by logical deduction ABC
is also finished.
Which implies that whatever VkBuffer vkCmdCopyBufferToImage ABC
writes to, can now be safely accessed.
So why do we need a client-side fence handle when what we actually want to know is if the command buffer we submitted is done executing?
Because we want to recycle the command buffer handle ASAP while the fence still lives.
I see... this also brings up the point that if we want to be able to query command buffer status using a command buffer handle, then the handles themselves have to remain valid beyond submission.
I think there are two directions we could go here: 1) Expose SDL_GpuFence as a resource and allow the user to optionally obtain the fence during command buffer submission. If the fence is obtained, the user is responsible for freeing it later, otherwise it can be automatically recycled internally when appropriate.
2) Do not expose fences and allow the user to query SDL_GpuCommandBuffer status. The user is responsible for releasing the command buffer from client-side whenever they want.
Either way, command buffers will have to be marked for recycling rather than recycled immediately because the implementation can only clean up resources and reset once it determines that the command buffer is done executing. Resource Destroy calls should not happen immediately, the implementation should do ref counting on resources to make sure they aren't referenced by any command buffers currently executing before they are actually destroyed.
Some formats may not be supported by some APIs/drivers:
SDL_GpuTextureUsage
(including the SDL_GpuVertexFormat
s use in vertex buffers).Vulkan's required format support:
SDL_GPUPIXELFMT_B5G6R5
=> VK_FORMAT_B5G6R5_UNORM_PACK16
:
VK_FORMAT_R5G6B5_UNORM_PACK16
is supported but not for storage image (shader read/write)SDL_GPUPIXELFMT_BGR5A1
=> VK_FORMAT_B5G5R5A1_UNORM_PACK16
:
VK_FORMAT_A1R5G5B5_UNORM_PACK16
is supported but not for storage image (shader read/write)SDL_GPUPIXELFMT_RGBA8
=> VK_FORMAT_R8G8B8A8_UNORM
:
SDL_GPUPIXELFMT_RGBA8_sRGB
=> VK_FORMAT_R8G8B8A8_SRGB
:
SDL_GPUPIXELFMT_BGRA8
=> VK_FORMAT_B8G8R8A8_UNORM
:
SDL_GPUPIXELFMT_BGRA8_sRGB
=> VK_FORMAT_B8G8R8A8_SRGB
:
SDL_GPUPIXELFMT_Depth24_Stencil8
=> VK_FORMAT_D24_UNORM_S8_UINT
:
*depth formats are strange:
Similarly with vertex input formats (I will not write them here, everything is written here)
Also textures from a swapchain may have different capabilities than those of normally allocated.
Some of those issues may be emulated, albeit with major performance degradation, by using storage buffers, emulating different formats AND samplers AND blending AND multisampling AND blits and probably even more...
Because of this I would suggest that formats should have a query if and with what usages is that pixel format supported and if that vertex format is supported. It would also ease the way into adding compressed/YCbCr formats.
the initial goals are D3D12, Metal, and Vulkan, and then things like GL4/GLES3/WebGPU can be considered (and possibly abandoned).
I think you are mixing WebGL and WebGPU.
Yes, I think I meant WebGL there; the current intention is to write a WebGPU backend, but that's probably coming after DX12, Metal, and Vulkan.
@icculus Speaking of which, how does the sdl gpu api compare against the seemingly very similar webgpu? Both aim to provide a cross platform interface over current gen apis ( metal, vulkan and direct x 12 ) but with a simpler to use interface
Both will run on native platforms and on the web ( eventually for sdl gpu ) . And both introduce a custom shader language
As far as i know, the main difference is that webgpu applications running on the web will need to go through one fewer translation layer than sdl gpu ones
What happened to the GPU API? I thought it was supposed to be shipped with SDL3, but as far as I can tell it's not in the SDL3 prerelease?
Is it dead (the branch hasn't had a commit in >5 months)?
@DanielGibson The code may have been merged to main branch
Here's the PR for reference: https://github.com/libsdl-org/SDL/pull/7067
Also the other PR has this comment: https://github.com/libsdl-org/SDL/pull/9312#issuecomment-2008049746
there is no scenario where either this PR or my branch is landing in the 3.2.0 release in the name of expediency.
I've been crunching on all the other parts of SDL3; it's still coming, it's just not there yet.
Ok, thanks for the clarification :)
So we're moving forward with the alternate implementation in https://github.com/libsdl-org/SDL/pull/9312 ...
It's pretty close to what I envisioned for the API (and most of the differences are things I would have changed to eventually when I ran into implementation problems later), and a whole team of talented graphics programmers have been focused on building it for several months now.
I would encourage you to look it over and give feedback on that PR before the API locks down.
Right now that PR does not have an SDL-specific shader bytecode (but is built so it can be added later, and I will be doing that work when SDL3 is solidified).
I intend to keep on with a custom shading language, because I think that in the right circumstances it will be amazingly useful, but the GPU API will not require it. When we get to the custom bytecode, we will make an effort to let other shader languages target it, either in patches to various compilers or a bytecode transpiler thing. That work is also separate from the new PR.
Hi!
I saw the GPU API push and have a bit of feedback to provide:
Unsupported vertex formats
Get rid of them (including the NORMALIZED variants). GPU support is flaky for them (or non-existent) and requires expensive emulation, which is hard to get right due to alignment issues.
Merging format enums?
The trend is to unify these two; since supported
GpuVertexFormat
is just a subset ofGpuPixelFormat
But personally I'm not decided for either option. They have pros and cons. From a low API perspective, unification makes sense. At a higher level? I don't know.
Format enum
Vulkan, DXGI and MTLPixelFormat formats are awfully similar.
In my experience this has been the best way to encode formats and intention (compared to alternatives like e.g. storing bits and shifts like DDS did back in the D3D7/DirectDraw days)
Personally I prefer the short versions, e.g. RGBA8_UNORM is just better than RGBA8888_UNORM when all formats have the same bit. Likewise RGB10A2 over RGBA1010102.
DXGI uses
TYPELESS
to indicate the format can be reinterpreted. Vulkan & Metal have no such format. In OgreNext we approach that by having agetFamily
function:SDL_GpuPixelFormat getFamily( SDL_GpuPixelFormat f );
This format returns the same format for all the variations, in order to simplify many operations (e.g. RGBA8_UNORM and RGBA8 are exactly the same in all ways except in how the data is returned when viewed by a shader).
More flags
You'll want to include a few more flags such as:
SDL_GPUTEXUSAGE_MIPMAPGEN
. When user wants SDL to generate Mipmaps, on D3D11 this meansRENDERTARGET
flag must be added so D3D11'sGenerateMipmaps
can be used. But on D3D12, it meansSHADERWRITE
must be added so a compute shader can generate the mipmaps.SDL_GPUTEXUSAGE_NOT_SAMPLED
Almost all textures want to be sampled, but there's a few where the user intends to use only read and write operations, without sampling. Hence a flag with negation to indicate sampling is not necessary. Or alternatively add
SDL_GPUTEXUSAGE_SAMPLED
and have the user almost always add this flag.Although generally flags should express positive terms instead of negation; In terms of user friendliness I think this is an exception, i.e.
SDL_GPUTEXUSAGE_NOT_SAMPLED
makes sense overSDL_GPUTEXUSAGE_SAMPLED
.SDL_GPUTEXUSAGE_CUBE_ARRAY_2D
Indicates the CubeMap or CubemapArray wants to also be used as ARRAY_2D.
SDL_GPUTEXUSAGE_REINTERPRETABLE
Indicates the texture will later be reinterpreted as another format.
More Texture types
There's a few types missing, like
CUBE_ARRAY
,2D_ARRAY
, etc.Being very clear about 'depth'
Documentation needs to be very clear about how
depth
is interpreted:SDL_GPUTEXTYPE_CUBE
, depth must be 6 or 1?SDL_GPUTEXTYPE_CUBE_ARRAY
, depth must be multiple of 6? or should depth be 1, 2, 3 and is internally multiplied by 6?SDL_GPUTEXTYPE_CUBE_ARRAY
but the texture is reinterpreted toSDL_GPUTEXTYPE_2D_ARRAY
and depth was 1 but internally multiplied by 6, is depth now 6x on the reinterpreted texture?SDL_GPUTEXTYPE_CUBE_*
. When reinterpreted toSDL_GPUTEXTYPE_2D_ARRAY
, all code is consistent and there's no surprise. There's no need for special handling.I also recommend to rename
depth
todepthOrSlices
. Its interpretation depends on whether the texture is of typeSDL_GPUTEXTYPE_3D
or not.What's the difference between depth and slice?
OpenGL
OpenGL will by far give you the biggest headaches and constraints in your API:
One-time mipmaps
There's mainly 2 cases for generating mipmaps on the fly:
The first case is not an issue. The second case is. The problem is that the regular code:
SDL_GPUTEXUSAGE_MIPMAPGEN
SDL_GpuGenerateMipmaps
is inefficient. Why? Because the
SDL_GPUTEXUSAGE_MIPMAPGEN
flag will stay forever which means the texture is internally either RENDERTARGET or SHADERWRITE when it doesn't have to, disabling lots of internal GPU optimizations.The easiest way to achieve this (cross platform):
An API that is easy to use could provide this convenience, by handling this TMP texture internally when loading data from disk.
SDL_GpuPipelineDescription's attachments
SDL_GpuPipelineDescription
containsSDL_GpuColorAttachmentDescription
, which contains aSDL_GpuTexture
.This is wrong. What's needed in the Pipeline desc is its metadata (format, type, msaa settings, etc; except resolution). A structure should contain this data, and could be obtained via API:
SDL_GpuTextureMetadata *metadata = SDL_GetMetadata( SDL_GpuTexture *texture )
ActuallySDL_GpuTextureMetadata
=SDL_GpuTextureDescription
but without thechar *name
.or alternatively (perhaps this is even better):
SDL_GpuColorAttachmentDescriptionMeta *metaDescriptor = SDL_GetMeta( SDL_GpuColorAttachmentDescription *descriptor )