libsdl-org / SDL

Simple Directmedia Layer
https://libsdl.org
zlib License
8.73k stars 1.64k forks source link

Add back 8-bit palette based rendering mode #6899

Open ihhub opened 1 year ago

ihhub commented 1 year ago

SDL 1 has of 8-bit palette based video mode support. This mode is extremely helpful for video games which handle 8-bit graphics. We are working on fheroes2 project which by default uses 8-bit graphics. This is a game engine recreation of a 25-years game old game so we cannot change graphics.

SDL 2 somehow does not support this video mode which leads to about 20% (depending on frame size) performance drop in rendering as we have to convert 8-bit image into 32-bit image before rendering a frame. It would be awesome if SDL 2 starts supporting this mode at least for specific platforms. SDL_PIXELFORMAT_INDEX8 format is not detected from the list of formats.

bradallred commented 1 year ago

PR #6192 might interest you

slime73 commented 1 year ago

I think @icculus' comment https://github.com/libsdl-org/SDL/pull/6192#issuecomment-1325503480 is reasonable - using a palette in a shader is basically instant, probably much faster than any other solution while also being forward-compatible with new platforms and backends.

ihhub commented 1 year ago

Hi @slime73 , are you suggesting that we need to use DirectX / X11 / OpenGL code in combination with SDL to have palette based images support?

slime73 commented 1 year ago

I'm saying using the GPU is probably going to be by far the most efficient method since you're concerned about performance. There's a new graphics API abstraction planned for SDL3 so you wouldn't have to write platform-specific DirectX/OpenGL/etc. code.

ihhub commented 1 year ago

Hi @slime73 , all the frame generation is done by our game engine, we use SDL only for rendering purposes when a frame is fully prepared (if we talk about rendering itself). SDL 3 might take time to develop and also to properly test before it can be fully marked as a stable release so for now our team is stuck with SDL 2. Hence we need to find a solution to improve the performance with existing libraries :) To be more specific our game engine generates roughly 60 FPS on some low-end machines while the real number should be about 125 FPS. With 20% boost this number will be roughly 70 FPS which is a might more reasonable performance. Translate these number into Raspberry Pi which usually has 30 FPS.

Alternatively, if you have time and willingness you can take a look into screen.cpp file, class class RenderEngine starting from line 738. Maybe we use SDL in a wrong way.

ccawley2011 commented 1 year ago

There's still a use for having it in the render API since it can be used on platforms that don't support shaders, but do have hardware support for CLUT8 palettes (such as the PSP or PS2), as well as allowing it to transparently handle falling back on software conversion for platforms that don't support either of those things. It'll also be simpler for the end user to not have to create custom shaders for palette rendering.

slime73 commented 1 year ago

if you have time and willingness

I don't, but if you haven't already I definitely recommend using profiling tools to identify and optimize bottlenecks (rather than guessing for example). The most expensive parts of a frame are often surprising compared to what you might expect, and even once you know what's taking time at a higher level, there are usually plenty of ways to optimize a thing without changing a library dependency.

having it in the render API

Whether it fits in SDL_Render depends on the intended scope of the render API, I suppose. In the past it's been kept pretty minimal, which has also allowed it to be ported to a lot of different platforms and maintained without a huge amount of effort. Ryan's comment seems to suggest keeping it that way.

ihhub commented 1 year ago

Hi @slime73 , 8-bit to 32-bit conversion is the biggest bottleneck right now as our code is extremely optimized on all levels :)

slime73 commented 1 year ago

Right, I'm suggesting there may be ways to optimize that in your code. And sometimes there are ways to optimize by reducing the amount of times it happens (sort of a bottom-up versus top-down approach to optimizing.) In many cases those two approaches can be combined, too.

Another way to optimize is to reframe the problem completely - for example by relying on more hardware acceleration for frame generation instead of just the generated result.

1bsyl commented 1 year ago

@ihhub quickly looking at your code... there is the CopyImageToSurface, then there is a SDL_UpdateTexture with the surface->pixels.

I don't really got what is the "transform" path. (palette to 32 bits ?) but, there is also memcpy path that is definitively a bottleneck: so, whenever you use this memcpy path, you should instead try to use SDL_UpdateTexture with the imageIn.image() ptr, with appropriate pitch if possible. .. you would avoid a memcpy per frame.

ihhub commented 1 year ago

Hi @1bsyl , transform is pointer to a 32-bit array so: for ( ; out != outEnd; ++out, ++in ) out = ( transform + *in );

is to convert from 8-bit to 32-bit image.

memcpy is used only for 8-bit surfaces which are supported only by SDL 1. Yes, I agree with using pixels from a texture. I did small tests today and I haven't noticed the difference in performance in comparison unless I remove the conversion loop.

1bsyl commented 1 year ago

to remove the conversion loop (and have something equivalent), you should use the palette texture PR. and you should definitively see something faster ? So I don't get it ..

what you can try also. is to use SDL2 internal function to perform the 8bit to 32 bits conversion because they may be faster (using assembly, duff loop, align pointer, not sure ..) . either: copy paste the relevant SDL blit code .. ( or use SDL_CreateRGBSurfaceFromFormat(imageIn.image, INDEX8), + set you palette. And convert surface to 32bit using SDL_ConvertSurfaceFormat() )

ihhub commented 1 year ago

Hi @1bsyl , if was a typo: if --> unless :)

I will give a try with textures and test on machines with much slower RAM as my machine is quite new so no surprise that memcpy doesn't make much difference.

slouken commented 8 months ago

This would be pretty reasonable to add, possibly with the new GPU API? I'm going to add this to the SDL 3.0 milestone for review.

rofl0r commented 4 months ago

wow bummer. i was just working on modifying my 32bit RGB code to use 8bit palette for drawing due to abysmall performance (16 fps on PSP @222 mhz). now it seems this will never be implemented for SDL2 and only eventually in SDL3 so i can use it maybe in 5 years when all distros provide SDL3, since i want to run my code on multiple platforms. guess i was lucky running into this issue while googling for "PSP fastest videomode" before i actually wrote all the code and would run into "Palettized textures are not supported".

slouken commented 1 week ago

Reading over the referenced PR, it looks like adding this will only add one additional entry point to the API, and is something we can add later, if bilinear filtering is solved at some point. For now I'm going to bump this to the 3.x milestone.

ihhub commented 1 week ago

It would be great if SDL3 would support 8-bit palette based images. It should give ~30% speed boost to fheroes2 project. We are waiting till it happens to migrate the project from SDL2🙂