libsdl-org / SDL

Simple Directmedia Layer
https://libsdl.org
zlib License
9.98k stars 1.84k forks source link

SDL_RenderGeometryRaw() or other way to render directly using packed uin32 colors #9009

Closed ocornut closed 4 months ago

ocornut commented 9 months ago

Followup my comment in https://github.com/libsdl-org/SDL/pull/8951#issuecomment-1925672363.

" Would it be possible for SDL to maintain a variant of SDL_RenderGeometryRaw() that took packed colors and didn’t incur an extra transformation?

It seems like this change would have meaningful effect on low-end machines (e.g. I noticed a PS2 renderer being mentioned) where some users now may be inclined to convert their packed colors to float4 and then the SDL backend is going to be doing the exact opposite.

Does SDL3 provide an alternative way (akin to vertex layout configuration) to allow rendering with packed colors?

(ref this is breaking dear imgui backend, i am not thrilled about converting 30k vertices every frame when GPU can do it supposedly? https://github.com/ocornut/imgui/issues/7293#issuecomment-1925656970) "

Please note that I am not familiar with how SDL3 is expected to extend the graphics API. All I am looking essentially is a way to render simple 2D vertex buffer with texturing, projection, without extraneous CPU side convertions, and it is possible that ongoing work on a more general "graphics/shaders" API for SDL3 may allow something like that through the configuration of vertex layout.

Thanks!

slouken commented 9 months ago

Yes, we'll support the packed color variant of SDL_RenderGeometryRaw().

slouken commented 9 months ago

I re-added the API, but either float colors or rgba colors are going to cause a bunch of conversion and copying depending on how the backend renderer works. There's already a bunch of processing involved for either path, so I'm going to reopen this so we can capture the discussion of how to improve performance.

Do you already have real-world use cases with lots of geometry that we can use to evaluate performance on a bunch of different renderers?

ocornut commented 9 months ago

I re-added the API, but either float colors or rgba colors are going to cause a bunch of conversion and copying depending on how the backend renderer works. There's already a bunch of processing involved for either path, so I'm going to reopen this so we can capture the discussion of how to improve performance.

I suppose the crux of the problem is to know whether SDL3 "shader support" will allow configuration of vertex layout. If it becomes possible then those convenience functions can supposedly be implemented using that.

Do you already have real-world use cases with lots of geometry that we can use to evaluate performance on a bunch of different renderers?

A crowed Dear ImGui based app can easily submit ~40k vertices per frame. One shortcut to get to that easily is to use ImPlot and run the Demo and open some complex plot. If you are not familiar with Dear ImGui I can help you setup such case.

Modified imgui_impl_sdlrenderer3.cpp with

if (ImGui::GetIO().KeyShift)
{
    // In SDL3, vertex colors are now represented as SDL_FColor (float[4]) instead of SDL_Color (char[4]).
    // We need to convert these colors.
    static SDL_FColor* convColorBuffer = NULL;
    static int convColorCapacity = 0;

    int verticesCount = cmd_list->VtxBuffer.Size - pcmd->VtxOffset;
    if (verticesCount > convColorCapacity)
    {
        void* newBuffer = realloc(convColorBuffer, verticesCount * sizeof(SDL_FColor));
        IM_ASSERT(newBuffer != NULL);
        convColorBuffer = (SDL_FColor*)newBuffer;
        convColorCapacity = verticesCount;
    }

    SDL_Color* p_in = (SDL_Color*)&(cmd_list->VtxBuffer.Data + pcmd->VtxOffset)->col;
    SDL_FColor* p_out = convColorBuffer;
    SDL_FColor* p_out_end = convColorBuffer + verticesCount;
    while (p_out < p_out_end)
    {
        SDL_Color color = *p_in;
        p_out->r = color.r / 255.f;
        p_out->g = color.g / 255.f;
        p_out->b = color.b / 255.f;
        p_out->a = color.a / 255.f;
        p_out++;
        p_in = (SDL_Color*)((char*)p_in + sizeof(ImDrawVert));
    }

    // Bind texture, Draw
    SDL_Texture* tex = (SDL_Texture*)pcmd->GetTexID();
    SDL_RenderGeometryRawFloat(bd->SDLRenderer, tex,
        xy, (int)sizeof(ImDrawVert),
        convColorBuffer, (int)sizeof(SDL_FColor),
        uv, (int)sizeof(ImDrawVert),
        cmd_list->VtxBuffer.Size - pcmd->VtxOffset,
        idx_buffer + pcmd->IdxOffset, pcmd->ElemCount, sizeof(ImDrawIdx));
}
else
{
    const SDL_Color* color = (const SDL_Color*)(const void*)((const char*)(vtx_buffer + pcmd->VtxOffset) + offsetof(ImDrawVert, col)); // SDL 2.0.19+

    // Bind texture, Draw
    SDL_Texture* tex = (SDL_Texture*)pcmd->GetTexID();
    SDL_RenderGeometryRaw(bd->SDLRenderer, tex,
        xy, (int)sizeof(ImDrawVert),
        color, (int)sizeof(ImDrawVert),
        uv, (int)sizeof(ImDrawVert),
        cmd_list->VtxBuffer.Size - pcmd->VtxOffset,
        idx_buffer + pcmd->IdxOffset, pcmd->ElemCount, sizeof(ImDrawIdx));
}

But it's not very meaningful to take measurement as SDL_RenderGeometryRaw() does its own conversion.

One positive side-effect of the loop being in SDL3 code is that it is more likely to be compiled in optimized form, wheras imgui or backend is more likely to be compiled unoptimized.

slouken commented 9 months ago

Sure, can you help me set up a test case? Mostly I'm looking to be able to test the impact of changes inside SDL.

Really, in the long term we probably want to change the API such that it takes a vertex layout and a set of vertices and returns an opaque handle that represents an optimized form of the data for rendering, possibly including handles to hardware vertex buffers. That way, for static data that doesn't change each frame, it's already on the GPU and doesn't need to be uploaded continuously.

slouken commented 9 months ago

Pinging @icculus because this is relevant for the GPU API.

icculus commented 6 months ago

Adding to the ABI milestone in case we need to make changes to the 2D renderer (the GPU API isn't landing in the first release, but would have this feature already).

ocornut commented 6 months ago

FYI as you seems to have kept SDL_RenderGeometryRaw() for me, and its current implementation does a CPU side conversion anyway, I have no issue doing it on my end if it allows you to remove it and better decide which entry point you are going to keep and how to name them.