Experiment with ImageSearchAll with tolerance to color variation

wind0204 commented 7 months ago

I didn't clean the comments, nor build 32-bit binaries, Please take a look into this pull request.

Best regards.

iseahound commented 7 months ago

I've only read through the code and it seems like it should work. I'll have to do some testing later. Could you change the line endings for ImagePut back? They got switched in your latest commit

wind0204 commented 7 months ago

I've only read through the code and it seems like it should work. I'll have to do some testing later. Could you change the line endings for ImagePut back? They got switched in your latest commit

Aha! That was why the diff read so crazy on Github.

iseahound commented 7 months ago

Can you explain the algorithm for finding the focused pixel? I'm glad you figured out why this is important, for a sprite the first few pixels are invisible, so the imagesearch would run really slowly.

Should this be s + w * h? unsigned int * last_pixel = s + w + h * w;

I'll move the algorithm for finding the focused pixel out of the ImageSearch (because there can be many ways to find a focused pixel), and will be sure to credit you once I figure out what yours does.

wind0204 commented 7 months ago

Can you explain the algorithm for finding the focused pixel? I'm glad you figured out why this is important, for a sprite the first few pixels are invisible, so the imagesearch would run really slowly.

The algorithm came from no good reasoning, just thought it would be nice to locate a pixel a little bit far from the top-left corner that is not transparent, and use it as a quick reference.

Should this be s + w * h? unsigned int * last_pixel = s + w + h * w;

That looks simple and readable, but then the loop will stop right before the bottommost line.

iseahound commented 7 months ago

Okay, I'll merge but expect major changes to your code.

iseahound commented 7 months ago

I'm not sure if your code even runs, but the logic looks like it's there. ImageSearch doesn't seem to work, nor does ImageSearchAll.

Reasoning behind some of the changes:

The code to find a focused pixel was moved because it is a different problem than ImageSearch
I'm not sure why the focused pixel was being stored in the result array
Not sure if there's a difference between storing x | y << 16 or doing y = offset / width and x = offset % width Finally, it's faster to have separate C code for the 0 variation case. Each case has to be individually vectorized anyways for a 2x or more speedup.

iseahound commented 7 months ago

Benchmarks!

Line 1 - Original ImageSearch (No variation) Line 2 - Your code (No variation) https://godbolt.org/z/qGexdGqMn Line 3 - Your code (variation = 1)

wind0204 commented 7 months ago

I'm not sure if your code even runs, but the logic looks like it's there. ImageSearch doesn't seem to work, nor does ImageSearchAll.

It ran on my wife's Windows system, with the game macro I made...

* The code to find a focused pixel was moved because it is a different problem than ImageSearch
I guess that's better, it adds the maintainability and organized look to the code base.
* I'm not sure why the focused pixel was being stored in the result array
I did it to prevent another search for the focused pixel in a repeatedly used sprite, just in case the sprite is big and bizarre especially with a lot of transparent part in the middle.
* Not sure if there's a difference between storing `x | y << 16` or doing `y = offset / width` and `x = offset % width`
  Finally, it's faster to have separate C code for the 0 variation case. Each case has to be individually vectorized anyways for a 2x or more speedup.
Awesome! Have you managed to vectorize it without using inline assembly? which SIMD instruction did it use?

iseahound commented 7 months ago

Oh I forgot to give context for the benchmarks. Line 2 doesn't matter, it will always default to the dedicated ImageSearch for no variation. It's running at 30% of the "max speed" which seems good to me. Thanks!

iseahound commented 7 months ago

Awesome! Have you managed to vectorize it without using inline assembly?

Could you expand on your cmov comment? I saw it pop up a few times with or without div, I'm not too familiar with assembly here.

which SIMD instruction did it use?

Ah not yet! I want to get the specific optimizations in place without using SIMD. Vectorized code is pretty brittle, meaning it's hard to modify after it's written.

wind0204 commented 7 months ago

Oh I forgot to give context for the benchmarks. Line 2 doesn't matter, it will always default to the dedicated ImageSearch for no variation. It's running at 30% of the "max speed" which seems good to me.

Is the dedicated ImageSearch coming from the code I wrote? The number was horribly low. 14.27 / 137.23 = 10.4%

Awesome! Have you managed to vectorize it without using inline assembly?

Could you expand on your cmov comment? I saw it pop up a few times with or without div, I'm not too familiar with assembly here.

According to godbolt.org, it seems that GCC 11.4 with at minimum -O1 uses CMOV with my code and your code uses DIV, and I read on the other day that DIV is not good at wide parallel execution on stackoverflow.com. I actually don't have any idea if CMP + CMOV can outperform DIV or not. I have to admit that I just believed without knowing.

wind0204 commented 7 months ago

* Not sure if there's a difference between storing `x | y << 16` or doing `y = offset / width` and `x = offset % width`
You must be talking about the line with "x << 16 | y" to give the result of search for focus pixel back to the function caller. the code is basically compressing the coordinates into 4 bytes, I could've used 8 bytes (which is the size of a pointer in a modern system) though.

iseahound commented 7 months ago

Feel free to continue the conversation at: https://github.com/iseahound/ImagePut/issues/35

iseahound / ImagePut

Experiment with ImageSearchAll with tolerance to color variation #36