Closed itzpr3d4t0r closed 3 months ago
How would switching from a macro to a static inline function have any impact on performance? They both compile into the caller.
How would switching from a macro to a static inline function have any impact on performance? They both compile into the caller.
I'm not sure but it should be a combination of factors, especially because the macros repeated ternary ops and other operations whereas the functions do not.
Anyways I tested for this and no macros = 30-35% performance improvement:
Take a READINT24 macro for example, it expands to this code:
((srcpix + ((((0) > (looph - 1)) ? (0) : (looph - 1)) * srcpitch) + (3 * loopw))[0] << 16 |
(srcpix + ((((0) > (looph - 1)) ? (0) : (looph - 1)) * srcpitch) + (3 * loopw))[1] << 8 |
(srcpix + ((((0) > (looph - 1)) ? (0) : (looph - 1)) * srcpitch) + (3 * loopw))[2])
So it does the same calculation thrice. With a function we pass in that pointer with the calculations in once and just index it so we save 2 calculations. So since we have 5 calls, we get 20 calculations with macros and 5 without.
This still applies for the WRITEINT24 macro.
This PR optimizes all bpp cases for the
transform.scale2x()
function. It does so through better use of pointers / switch to static inline functions instead of macros that repeat some calculations and one optimization suggestion in https://www.scale2x.it/algorithm.Performance improvements vary on a case by case basis depending on suface size and bpp, so I'll just leave my results here (top one is 32bpp):
What are these so-called “macros”? And by the way, the dash-board is really cool 😁
It is interesting to see that pre-this-pr the performance of scale2x
is actually worse than the generic scale
for your usage.
What are these so-called “macros”? And by the way, the dash-board is really cool 😁
@Gabryel-lima macros are a concept in C and C++. https://www.geeksforgeeks.org/macros-and-its-types-in-c-cpp/
@itzpr3d4t0r I don't think we can use restrict here, because technically there could be other instances of things accessing the surface pixels at the same time right? Like with surfarray? I'd be interested to see if there's actually a significant difference between using restrict and not using restrict.
@itzpr3d4t0r I don't think we can use restrict here, because technically there could be other instances of things accessing the surface pixels at the same time right? Like with surfarray? I'd be interested to see if there's actually a significant difference between using restrict and not using restrict.
How exactly does this happen? Shouldn't locking/unlocking by surfarray and this function prevent multiple threads accessing the memory at the same time. Is there another way that this memory could be accessed simultaneously, outside of a multithreaded context, that I'm blanking on? My computer science educational background is sometimes spotty.
I have to admit restrict
is sort of a weird optimization path, I observed some minor-medium benefits but only in specific size ranges so I guess it's not worth the concern.
What are these so-called “macros”? And by the way, the dash-board is really cool 😁
@Gabryel-lima macros are a concept in C and C++. https://www.geeksforgeeks.org/macros-and-its-types-in-c-cpp/
@itzpr3d4t0r I don't think we can use restrict here, because technically there could be other instances of things accessing the surface pixels at the same time right? Like with surfarray? I'd be interested to see if there's actually a significant difference between using restrict and not using restrict.
@Starbuck5 Wow, very interesting! Thanks!
@itzpr3d4t0r I don't think we can use restrict here, because technically there could be other instances of things accessing the surface pixels at the same time right? Like with surfarray? I'd be interested to see if there's actually a significant difference between using restrict and not using restrict.
How exactly does this happen? Shouldn't locking/unlocking by surfarray and this function prevent multiple threads accessing the memory at the same time. Is there another way that this memory could be accessed simultaneously, outside of a multithreaded context, that I'm blanking on? My computer science educational background is sometimes spotty.
It should, I'm just being cautious... Maybe if you constructed a Surface with shared memory from another object you could edit it while running something else?
I don't have a solid scenario in mind.
This PR optimizes all bpp cases for the
transform.scale2x()
function. It does so through better use of pointers / switch to static inline functions instead of macros that repeat some calculations and one optimization suggestion in https://www.scale2x.it/algorithm.Performance improvements vary on a case by case basis depending on suface size and bpp, so I'll just leave my results here (top one is 32bpp):