Closed wind0204 closed 7 months ago
I've only read through the code and it seems like it should work. I'll have to do some testing later. Could you change the line endings for ImagePut back? They got switched in your latest commit
I've only read through the code and it seems like it should work. I'll have to do some testing later. Could you change the line endings for ImagePut back? They got switched in your latest commit
Aha! That was why the diff read so crazy on Github.
Can you explain the algorithm for finding the focused pixel? I'm glad you figured out why this is important, for a sprite the first few pixels are invisible, so the imagesearch would run really slowly.
Should this be s + w * h
?
unsigned int * last_pixel = s + w + h * w;
I'll move the algorithm for finding the focused pixel out of the ImageSearch (because there can be many ways to find a focused pixel), and will be sure to credit you once I figure out what yours does.
Can you explain the algorithm for finding the focused pixel? I'm glad you figured out why this is important, for a sprite the first few pixels are invisible, so the imagesearch would run really slowly.
The algorithm came from no good reasoning, just thought it would be nice to locate a pixel a little bit far from the top-left corner that is not transparent, and use it as a quick reference.
Should this be
s + w * h
?unsigned int * last_pixel = s + w + h * w;
That looks simple and readable, but then the loop will stop right before the bottommost line.
Okay, I'll merge but expect major changes to your code.
I'm not sure if your code even runs, but the logic looks like it's there. ImageSearch doesn't seem to work, nor does ImageSearchAll.
Reasoning behind some of the changes:
x | y << 16
or doing y = offset / width
and x = offset % width
Finally, it's faster to have separate C code for the 0 variation case. Each case has to be individually vectorized anyways for a 2x or more speedup. Benchmarks!
Line 1 - Original ImageSearch (No variation) Line 2 - Your code (No variation) https://godbolt.org/z/qGexdGqMn Line 3 - Your code (variation = 1)
I'm not sure if your code even runs, but the logic looks like it's there. ImageSearch doesn't seem to work, nor does ImageSearchAll.
It ran on my wife's Windows system, with the game macro I made...
* The code to find a focused pixel was moved because it is a different problem than ImageSearch
I guess that's better, it adds the maintainability and organized look to the code base.
* I'm not sure why the focused pixel was being stored in the result array
I did it to prevent another search for the focused pixel in a repeatedly used sprite, just in case the sprite is big and bizarre especially with a lot of transparent part in the middle.
* Not sure if there's a difference between storing `x | y << 16` or doing `y = offset / width` and `x = offset % width` Finally, it's faster to have separate C code for the 0 variation case. Each case has to be individually vectorized anyways for a 2x or more speedup.
Awesome! Have you managed to vectorize it without using inline assembly? which SIMD instruction did it use?
Oh I forgot to give context for the benchmarks. Line 2 doesn't matter, it will always default to the dedicated ImageSearch for no variation. It's running at 30% of the "max speed" which seems good to me. Thanks!
Awesome! Have you managed to vectorize it without using inline assembly?
Could you expand on your cmov comment? I saw it pop up a few times with or without div, I'm not too familiar with assembly here.
which SIMD instruction did it use?
Ah not yet! I want to get the specific optimizations in place without using SIMD. Vectorized code is pretty brittle, meaning it's hard to modify after it's written.
Oh I forgot to give context for the benchmarks. Line 2 doesn't matter, it will always default to the dedicated ImageSearch for no variation. It's running at 30% of the "max speed" which seems good to me.
Is the dedicated ImageSearch coming from the code I wrote? The number was horribly low. 14.27 / 137.23 = 10.4%
Awesome! Have you managed to vectorize it without using inline assembly?
Could you expand on your cmov comment? I saw it pop up a few times with or without div, I'm not too familiar with assembly here.
According to godbolt.org, it seems that GCC 11.4 with at minimum -O1 uses CMOV with my code and your code uses DIV, and I read on the other day that DIV is not good at wide parallel execution on stackoverflow.com. I actually don't have any idea if CMP + CMOV can outperform DIV or not. I have to admit that I just believed without knowing.
* Not sure if there's a difference between storing `x | y << 16` or doing `y = offset / width` and `x = offset % width`
You must be talking about the line with "x << 16 | y" to give the result of search for focus pixel back to the function caller. the code is basically compressing the coordinates into 4 bytes, I could've used 8 bytes (which is the size of a pointer in a modern system) though.
Feel free to continue the conversation at: https://github.com/iseahound/ImagePut/issues/35
I didn't clean the comments, nor build 32-bit binaries, Please take a look into this pull request.
Best regards.