For coordinate selection, it could be designed to target the current program rather than the entire desktop. Here's an idea for selecting coordinates within the current program: take a screenshot of the program, then use OpenCV to recognize and label the content of the image. This approach should theoretically work for all interfaces and provide an accurate match to the elements. Below are sample images and code:
import cv2
import numpy as np
image_path = 's.jpg'
image = cv2.imread(image_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray, 50, 150, apertureSize=3)
kernel = np.ones((5, 5), np.uint8)
dilated_edges = cv2.dilate(edges, kernel, iterations=1)
contours, _ = cv2.findContours(dilated_edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
for contour in contours:
hull = cv2.convexHull(contour)
x, y, w, h = cv2.boundingRect(hull)
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('Image with Text Line Boxes', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
For coordinate selection, it could be designed to target the current program rather than the entire desktop. Here's an idea for selecting coordinates within the current program: take a screenshot of the program, then use OpenCV to recognize and label the content of the image. This approach should theoretically work for all interfaces and provide an accurate match to the elements. Below are sample images and code: