iseahound / ImagePut

A core library for images in AutoHotkey. Supports AutoHotkey v1 and v2.
https://www.autohotkey.com/boards/viewtopic.php?f=83&t=76633
MIT License
116 stars 24 forks source link

Improve ImageSearch #33

Closed iseahound closed 9 months ago

iseahound commented 9 months ago

ImageSearch has two components:

Prioritize:

Here is some code to help you:

#include *i ImagePut%A_TrayMenu%.ahk
#include *i ImagePut (for v%true%).ahk
#singleinstance force

pic := ImagePutBuffer("[images]\h.png")                ; Screen capture
pic.show() ; or ImageShow(pic)                         ; Show image
if xy := pic.ImageSearch("[images]\g.png") {           ; Search image
    MouseMove xy[1], xy[2]                             ; Move cursor
    Send "{MButton}"                                   ; MsgBox pic[xy*]
} else Tooltip "no"

https://github.com/iseahound/ImagePut/assets/9779668/3c2587da-d6ed-449d-af78-89633e2a7442 https://github.com/iseahound/ImagePut/assets/9779668/25a63f1b-8fb2-4c2a-84c4-34867b1e5ece

Useful ideas: boyer moore (and string search algorithms in general) multi-dimensional string searcch Has ImageSearch been a priority in the public domain?

Some research papers would be immensely valuable.

Current performance is about 19 fps, compared to pixelsearch's 4000 fps. Aim for 800 fps or higher.

iseahound commented 9 months ago

Benchmarks (on a slow laptop):

Python's template matching (from OpenCV): 3.43 fps First Attempt at ImageSearch: 13.43 fps

iseahound commented 9 months ago

This one does about 200 fps on the test images by preventing the subimage matching.

https://godbolt.org/z/vP679rhYP

The question is: is matching 2/3/4 best? For this script, matching 1,4 is good. But only 4 can be optimized. 2 and 3 may be too correlated.

iseahound commented 9 months ago

This one fastest: https://godbolt.org/z/rGKczzYqs

Rank:

  1. focus
  2. start pixel
  3. range
iseahound commented 9 months ago
#include *i ImagePut%A_TrayMenu%.ahk
#include *i ImagePut (for v%true%).ahk
#singleinstance force

; a := imageputwindow("g.png")

pic := ImagePutBuffer("h.png")                ; Screen capture
hwnd := pic.show() ; or ImageShow(pic)                         ; Show image
if xy := pic.ImageSearch3("g.png") {           ; Search image
    x := xy[1], y := xy[2]
    MouseMove x, y
    WinGetPos &wx, &wy,,, hwnd
    ImagePutWindow("g.png"
       , x ", " y
       , [wx + x, wy + y]) 
} else Tooltip "no"
iseahound commented 9 months ago

or

pic := ImagePutBuffer("h.png")                ; Screen capture
hwnd := pic.show() ; or ImageShow(pic)                         ; Show image
if xy := pic.ImageSearch("g.png") {           ; Search image
    x := xy[1], y := xy[2]
    MouseMove x, y
    ImagePutWindow("g.png"
       , x ", " y
       , [x,y,,, hwnd])
} else Tooltip "no"

with

      try dpi := DllCall("SetThreadDpiAwarenessContext", "ptr", -3, "ptr")
      if IsObject(pos) && pos.HasKey(5) {
         pos[5] := (hwnd := WinExist(pos[5])) ? hwnd : pos[5]
         VarSetCapacity(rect, 16, 0)
         DllCall("GetClientRect", "ptr", pos[5], "ptr", &rect)
         DllCall("ClientToScreen", "ptr", pos[5], "ptr", &rect)
         x += NumGet(rect, 0, "int")
         y += NumGet(rect, 4, "int")
      }
      try DllCall("SetThreadDpiAwarenessContext", "ptr", dpi, "ptr")

?

iseahound commented 9 months ago

Yeah, the [x, y, w, h, r] notation seems to be ideal, as it would allow screenshots relative to a window as well.

iseahound commented 9 months ago

Another note to rank:

  1. focus
  2. start pixel - if removed with 3, there is a very large speed change.
  3. range - if remove, not much speed change

Because the first pixel kind of acts like another check!

iseahound commented 9 months ago
         ; Search for the address of the first matching image.
         address := DllCall(code, "ptr", this.ptr, "uint", this.width, "uint", this.height
            , "ptr", image.ptr, "uint", image.width, "uint", image.height, "uint", image.width//2, "uint", image.height//2
            , "cdecl ptr")
iseahound commented 9 months ago

All of the above has been completed. Only useful data is:

Rank:

  1. focus
  2. start pixel
  3. range