wolfmanstout / screen-ocr

Easily perform OCR on portions of the screen, choosing from a selection of backends.
Apache License 2.0
40 stars 7 forks source link

Multi-monitor support #4

Open LexiconCode opened 3 years ago

LexiconCode commented 3 years ago

Unless I'm missing something I do not believe the package supports multi-monitors. This could be handled outside the package but it seems to make sense to have it integrated.

    def _screenshot_nearby(self, screen_coordinates):
        # TODO Consider cropping within grab() for performance. Requires knowledge
        # of screen bounds.
        screenshot = ImageGrab.grab()
        bounding_box = (max(0, screen_coordinates[0] - self.radius),
                        max(0, screen_coordinates[1] - self.radius),
                        min(screenshot.width, screen_coordinates[0] + self.radius),
                        min(screenshot.height, screen_coordinates[1] + self.radius))
        screenshot = screenshot.crop(bounding_box)
        return screenshot, bounding_box
wolfmanstout commented 3 years ago

Looks like with the Pillow library, there isn't any platform-independent way to do this, but I could pass in all_screens=True so that on Windows all monitors are accessible from within the coordinate system: https://pillow.readthedocs.io/en/stable/reference/ImageGrab.html

I don't have multiple monitors so I'm not able to test this myself; can you please see if that works for you?

I will note that it's another matter entirely to support multiple monitors in the gaze-ocr package, because you would need eye tracking support across all screens, which Tobii does not provide with the APIs I am using.

LexiconCode commented 3 years ago

FYI: Doing a little research based on https://stackoverflow.com/questions/44140586/imagegrab-grab-method-is-too-slow/51130619

The idea here is to see what kind of penalty there is for taking a screenshot based on pixels and the performance of the underlying framework. The fastest library mss is 3.5 Python above. For my primary monitor (2560, 1440) has a max radius of 2600.

# !pip install image
# !pip install opencv-python
# !pip install pyscreenshot

import numpy as np
from time import time

resolutions = [
    (0, 0, 100,100),(0, 0, 200,200),
    (0, 0, 300,300),(0, 0, 400,400),
    (0, 0, 500,500),(0, 0, 600,600),
    (0, 0, 700,700),(0, 0, 800,800),
    (0, 0, 900,900),(0, 0, 1000,1000),
    (0, 0, 1000,1000),(0, 0, 1100,1100),
    (0, 0, 1200,1200),(0, 0, 1300,1300),
    (0, 0, 1400,1400),(0, 0, 1500,1500),
    (0, 0, 1600,1600),(0, 0, 1700,1700),
    (0, 0, 1800,1800),(0, 0, 1900,1900),
    (0, 0, 2000,2000),(0, 0, 2100,2100),
    (0, 0, 2200,2200),(0, 0, 2300,2300),
    (0, 0, 2400,2400),(0, 0, 2500,2500),
    (0, 0, 2600,2600),(0, 0, 2700,2700),
    (0, 0, 2800,2800),(0, 0, 2900,2900)
]

import numpy as np
import pyscreenshot as ImageGrab
import cv2

def show(nparray):
    import cv2
    cv2.imshow('window',cv2.cvtColor(nparray, cv2.COLOR_BGR2RGB))
    # key controls in a displayed window
    # if cv2.waitKey(25) & 0xFF == ord('q'):
        # cv2.destroyAllWindows()

def mss_test(shape) :
    average = time()
    import mss
    sct = mss.mss()
    mon = {"top": shape[0], "left": shape[1], "width": shape[2]-shape[1], "height": shape[3]-shape[0]}
    for _ in range(5):
        printscreen =  np.asarray(sct.grab(mon))
    average_ms = int(1000*(time()-average)/5.)
    return average_ms, printscreen.shape

def pil_test(shape) :
    average = time()
    from PIL import ImageGrab
    for _ in range(5):
        printscreen =  np.array(ImageGrab.grab(bbox=shape))
    average_ms = int(1000*(time()-average)/5.)
    return average_ms, printscreen.shape

def pyscreenshot_test(shape):
    average = time()
    import pyscreenshot as ImageGrab
    for _ in range(5):
        printscreen = np.asarray( ImageGrab.grab(bbox=shape) )
    average_ms = int(1000*(time()-average)/5.)
    return average_ms, printscreen.shape

named_function_pair = zip("mss_test,pil_test,pyscreenshot_test".split(","),
    [mss_test,pil_test,pyscreenshot_test])

for name,function in named_function_pair:
    results = [ function(res) for res in resolutions ]
    print("Speed results for using",name)
    for res,result in zip(resolutions,results) :
        speed,shape = result
        print(res,"took",speed,"ms, produced shaped",shape)

Speed results for using mss_test (0, 0, 100, 100) took 9 ms, produced shaped (100, 100, 4) (0, 0, 200, 200) took 6 ms, produced shaped (200, 200, 4) (0, 0, 300, 300) took 7 ms, produced shaped (300, 300, 4) (0, 0, 400, 400) took 6 ms, produced shaped (400, 400, 4) (0, 0, 500, 500) took 6 ms, produced shaped (500, 500, 4) (0, 0, 600, 600) took 7 ms, produced shaped (600, 600, 4) (0, 0, 700, 700) took 7 ms, produced shaped (700, 700, 4) (0, 0, 800, 800) took 7 ms, produced shaped (800, 800, 4) (0, 0, 900, 900) took 14 ms, produced shaped (900, 900, 4) (0, 0, 1000, 1000) took 14 ms, produced shaped (1000, 1000, 4) (0, 0, 1000, 1000) took 13 ms, produced shaped (1000, 1000, 4) (0, 0, 1100, 1100) took 14 ms, produced shaped (1100, 1100, 4) (0, 0, 1200, 1200) took 14 ms, produced shaped (1200, 1200, 4) (0, 0, 1300, 1300) took 18 ms, produced shaped (1300, 1300, 4) (0, 0, 1400, 1400) took 21 ms, produced shaped (1400, 1400, 4) (0, 0, 1500, 1500) took 23 ms, produced shaped (1500, 1500, 4) (0, 0, 1600, 1600) took 28 ms, produced shaped (1600, 1600, 4) (0, 0, 1700, 1700) took 29 ms, produced shaped (1700, 1700, 4) (0, 0, 1800, 1800) took 29 ms, produced shaped (1800, 1800, 4) (0, 0, 1900, 1900) took 32 ms, produced shaped (1900, 1900, 4) (0, 0, 2000, 2000) took 36 ms, produced shaped (2000, 2000, 4) (0, 0, 2100, 2100) took 36 ms, produced shaped (2100, 2100, 4) (0, 0, 2200, 2200) took 37 ms, produced shaped (2200, 2200, 4) (0, 0, 2300, 2300) took 41 ms, produced shaped (2300, 2300, 4) (0, 0, 2400, 2400) took 43 ms, produced shaped (2400, 2400, 4) (0, 0, 2500, 2500) took 44 ms, produced shaped (2500, 2500, 4) (0, 0, 2600, 2600) took 44 ms, produced shaped (2600, 2600, 4) (0, 0, 2700, 2700) took 45 ms, produced shaped (2700, 2700, 4) (0, 0, 2800, 2800) took 45 ms, produced shaped (2800, 2800, 4) (0, 0, 2900, 2900) took 48 ms, produced shaped (2900, 2900, 4)

Speed results for using pil_test (0, 0, 100, 100) took 41 ms, produced shaped (100, 100, 3) (0, 0, 200, 200) took 41 ms, produced shaped (200, 200, 3) (0, 0, 300, 300) took 41 ms, produced shaped (300, 300, 3) (0, 0, 400, 400) took 41 ms, produced shaped (400, 400, 3) (0, 0, 500, 500) took 46 ms, produced shaped (500, 500, 3) (0, 0, 600, 600) took 48 ms, produced shaped (600, 600, 3) (0, 0, 700, 700) took 48 ms, produced shaped (700, 700, 3) (0, 0, 800, 800) took 43 ms, produced shaped (800, 800, 3) (0, 0, 900, 900) took 40 ms, produced shaped (900, 900, 3) (0, 0, 1000, 1000) took 42 ms, produced shaped (1000, 1000, 3) (0, 0, 1000, 1000) took 41 ms, produced shaped (1000, 1000, 3) (0, 0, 1100, 1100) took 41 ms, produced shaped (1100, 1100, 3) (0, 0, 1200, 1200) took 42 ms, produced shaped (1200, 1200, 3) (0, 0, 1300, 1300) took 40 ms, produced shaped (1300, 1300, 3) (0, 0, 1400, 1400) took 43 ms, produced shaped (1400, 1400, 3) (0, 0, 1500, 1500) took 47 ms, produced shaped (1500, 1500, 3) (0, 0, 1600, 1600) took 48 ms, produced shaped (1600, 1600, 3) (0, 0, 1700, 1700) took 47 ms, produced shaped (1700, 1700, 3) (0, 0, 1800, 1800) took 49 ms, produced shaped (1800, 1800, 3) (0, 0, 1900, 1900) took 54 ms, produced shaped (1900, 1900, 3) (0, 0, 2000, 2000) took 55 ms, produced shaped (2000, 2000, 3) (0, 0, 2100, 2100) took 53 ms, produced shaped (2100, 2100, 3) (0, 0, 2200, 2200) took 57 ms, produced shaped (2200, 2200, 3) (0, 0, 2300, 2300) took 62 ms, produced shaped (2300, 2300, 3) (0, 0, 2400, 2400) took 64 ms, produced shaped (2400, 2400, 3) (0, 0, 2500, 2500) took 69 ms, produced shaped (2500, 2500, 3) (0, 0, 2600, 2600) took 75 ms, produced shaped (2600, 2600, 3) (0, 0, 2700, 2700) took 73 ms, produced shaped (2700, 2700, 3) (0, 0, 2800, 2800) took 79 ms, produced shaped (2800, 2800, 3) (0, 0, 2900, 2900) took 86 ms, produced shaped (2900, 2900, 3)

Speed results for using pyscreenshot_test (0, 0, 100, 100) took 221 ms, produced shaped (100, 100, 3) (0, 0, 200, 200) took 207 ms, produced shaped (200, 200, 3) (0, 0, 300, 300) took 210 ms, produced shaped (300, 300, 3) (0, 0, 400, 400) took 215 ms, produced shaped (400, 400, 3) (0, 0, 500, 500) took 236 ms, produced shaped (500, 500, 3) (0, 0, 600, 600) took 228 ms, produced shaped (600, 600, 3) (0, 0, 700, 700) took 238 ms, produced shaped (700, 700, 3) (0, 0, 800, 800) took 247 ms, produced shaped (800, 800, 3) (0, 0, 900, 900) took 261 ms, produced shaped (900, 900, 3) (0, 0, 1000, 1000) took 262 ms, produced shaped (1000, 1000, 3) (0, 0, 1000, 1000) took 256 ms, produced shaped (1000, 1000, 3) (0, 0, 1100, 1100) took 263 ms, produced shaped (1100, 1100, 3) (0, 0, 1200, 1200) took 273 ms, produced shaped (1200, 1200, 3) (0, 0, 1300, 1300) took 286 ms, produced shaped (1300, 1300, 3) (0, 0, 1400, 1400) took 296 ms, produced shaped (1400, 1400, 3) (0, 0, 1500, 1500) took 316 ms, produced shaped (1500, 1500, 3) (0, 0, 1600, 1600) took 327 ms, produced shaped (1600, 1600, 3) (0, 0, 1700, 1700) took 341 ms, produced shaped (1700, 1700, 3) (0, 0, 1800, 1800) took 351 ms, produced shaped (1800, 1800, 3) (0, 0, 1900, 1900) took 363 ms, produced shaped (1900, 1900, 3) (0, 0, 2000, 2000) took 382 ms, produced shaped (2000, 2000, 3) (0, 0, 2100, 2100) took 393 ms, produced shaped (2100, 2100, 3) (0, 0, 2200, 2200) took 408 ms, produced shaped (2200, 2200, 3) (0, 0, 2300, 2300) took 427 ms, produced shaped (2300, 2300, 3) (0, 0, 2400, 2400) took 444 ms, produced shaped (2400, 2400, 3) (0, 0, 2500, 2500) took 463 ms, produced shaped (2500, 2500, 3) (0, 0, 2600, 2600) took 478 ms, produced shaped (2600, 2600, 3) (0, 0, 2700, 2700) took 498 ms, produced shaped (2700, 2700, 3) (0, 0, 2800, 2800) took 520 ms, produced shaped (2800, 2800, 3) (0, 0, 2900, 2900) took 536 ms, produced shaped (2900, 2900, 3)

LexiconCode commented 3 years ago

For ImageGrab all_screens=True on two monitors (3840, 2160) and (2560, 1440)

def pil_test() :
    average = time()
    from PIL import ImageGrab
    for _ in range(5):
        printscreen =  np.array(ImageGrab.grab(all_screens=True))
    average_ms = int(1000*(time()-average)/5.)
    print(average_ms)

170ms, 187ms, 168ms, 167ms

LexiconCode commented 3 years ago

I will note that it's another matter entirely to support multiple monitors in the gaze-ocr package, because you would need eye tracking support across all screens, which Tobii does not provide with the APIs I am using.

Yes that's unfortunate and I don't think Tobii most recent APIs support multi-monitor. Individuals disabilities exist on a range and the means to which they interact with the computer. Some people compensate by using limited mouse movement, stylus, Xbox 360 controller, and various other devices. What most of them have have in common is they move the mouse cursor. My goal here is to provide alternative means eye tracking to leverage OCR capabilities focused on the cursor regardless of how the cursor is being moved.

wolfmanstout commented 3 years ago

I will note that it's another matter entirely to support multiple monitors in the gaze-ocr package, because you would need eye tracking support across all screens, which Tobii does not provide with the APIs I am using.

Yes that's unfortunate and I don't think Tobii most recent APIs support multi-monitor. Individuals disabilities exist on a range and the means to which they interact with the computer. Some people compensate by using limited mouse movement, stylus, Xbox 360 controller, and various other devices. What most of them have have in common is they move the mouse cursor. My goal here is to provide alternative means eye tracking to leverage OCR capabilities focused on the cursor regardless of how the cursor is being moved.

Makes sense! If there is functionality that you want to use in gaze-ocr without tying it to eye tracking, please let me know the details and we can think about if there's a good way to support that. On the other hand, if you just want to use screen-ocr that should be no problem as it has no dependencies on eye tracking.

I am likely to be very busy over the next ... well I'm having my first kid so I don't know how long. But I'll try to support what I can :-)

LexiconCode commented 3 years ago

I am likely to be very busy over the next ... well I'm having my first kid so I don't know how long. But I'll try to support what I can :-)

Congratulations! As someone that recently had a 1st be be prepared for not having a lot of time :)