Open DominicMukilan opened 3 months ago
Which model / language did you use?
Python 3.12 IDE Pycharm
reprex: import json import cv2 import pytesseract from PIL import Image import pandas as pd
json_path = "new_pred.json" with open(json_path, "r") as file: annotations = json.load(file)
coordinates = [annotation["box"] for annotation in annotations]
image_path = "new_pred.jpg" image = cv2.imread(image_path)
image_pil = Image.open(image_path) image_width, image_height = image_pil.size
def crop_and_ocr_with_boundary_check(image, coordinates, image_width, image_height): ocr_results = [] skipped_coordinates = [] for i, (x1, y1, x2, y2) in enumerate(coordinates):
original_coords = (x1, y1, x2, y2)
x1 = max(0, min(x1, image_width - 1))
y1 = max(0, min(y1, image_height - 1))
x2 = max(0, min(x2, image_width))
y2 = max(0, min(y2, image_height))
## Check if the box is too small
if x2 - x1 < 5 or y2 - y1 < 5:
skipped_coordinates.append((i, original_coords, "Too small"))
continue
## Crop the region from the image
cropped_img = image[y1:y2, x1:x2]
## Perform OCR on the cropped image
text = pytesseract.image_to_string(cropped_img)
## Append the OCR result
ocr_results.append({
"coordinates": (x1, y1, x2, y2),
"text": text.strip() # Remove leading/trailing whitespace
})
return ocr_results, skipped_coordinates
ocr_results, skipped_coordinates = crop_and_ocr_with_boundary_check(image, coordinates, image_width, image_height)
ocr_df = pd.DataFrame(ocr_results)
print(f"Total annotations in JSON: {len(annotations)}") print(f"Total OCR results: {len(ocr_results)}") print(f"Skipped coordinates: {len(skipped_coordinates)}") for skip in skipped_coordinates: print(f" Index: {skip[0]}, Coordinates: {skip[1]}, Reason: {skip[2]}")
print(ocr_df)
ocr_df.to_csv("ocr_results.csv", index=False) print("Results saved to ocr_results.csv")
print(f"Image dimensions: {image_width}x{image_height}")
Please add also your image (or its URL if it is online) to this issue report.
Python output:
Total annotations in JSON: 50
Total OCR results: 50
Skipped coordinates: 0
coordinates text
0 (1763, 5732, 2293, 5861)
1 (1785, 5974, 2332, 6064) | 1.314.01 Le
2 (1848, 6119, 2648, 6215)
3 (2901, 4062, 3223, 4164) 03 X 45°
4 (1029, 577, 1510, 665)
5 (8511, 2174, 8895, 2267) 188
6 (6735, 306, 7311, 411) —e| [a—( 188 )
7 (1732, 3857, 2147, 3941) — w=! 64 ba
8 (3571, 508, 4259, 604) | |e ——_ 188+.003
9 (1069, 1827, 1666, 1940) @D .615+.002\n\nLa
10 (2349, 5867, 2629, 5952)
11 (2409, 3895, 3120, 3987) —e| -— .382+.003
12 (4672, 2200, 5422, 2320) a 2.487+.002 ——=
13 (7402, 3622, 7733, 3817) 30°
14 (8679, 2312, 9175, 2417)
15 (9044, 4051, 9409, 4597)
16 (786, 771, 1275, 853)
17 (1721, 1328, 1869, 1528)
18 (3437, 2321, 3790, 2432) -| 860
19 (2097, 4084, 2295, 4270)
20 (1159, 4032, 1699, 4153) 3 3/4-10 UNS-2A
21 (3918, 4779, 4131, 4931) 2.973
22 (8506, 531, 8895, 626) 1.595+.002
23 (8901, 997, 9284, 1168)
24 (5060, 1791, 5344, 1954)
25 (7401, 2650, 7823, 2751) =| 420
26 (1850, 2369, 2072, 2481) R.125
27 (3355, 651, 3604, 761) ‘a
28 (8916, 3820, 9281, 4032)
29 (1778, 4305, 1924, 4589)
30 (5060, 1715, 5384, 1950) ngle: 0.46
31 (984, 1257, 1145, 1370) a\nA\
32 (7791, 4801, 8101, 4951)
33 (8217, 2315, 9202, 2415)
34 (2343, 5511, 2888, 5609)
35 (8267, 1997, 8656, 2096)
36 (1462, 1665, 1715, 1764)
37 (433, 1303, 546, 1757)
38 (8384, 1476, 8512, 1830)
39 (1517, 5565, 2035, 5665) 332.01-— |
40 (6247, 3077, 6399, 4105)
41 (4327, 2035, 4901, 2181) (.078 = —
42 (9207, 1376, 9383, 1751)
43 (4671, 4768, 4947, 4940) |\n03.875
44 (4986, 896, 5622, 1195) ay\n>
45 (4886, 1063, 5328, 1186)
46 (4751, 1802, 4996, 1954)
47 (3044, 4667, 3238, 4771) -R.03
48 (8680, 3010, 8963, 3225)
49 (5117, 890, 5620, 1258)
Results saved to ocr_results.csv
Image dimensions: 10200x6600
new_pred.json
requirements.txt
Current Behavior
No response
Expected Behavior
No response
Suggested Fix
No response
tesseract -v
tesseract v5.4.0.20240606 leptonica-1.84.1 libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.1) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.3 : libwebp 1.4.0 : libopenjp2 2.5.2 Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.7.4 zlib/1.3.1 liblzma/5.6.1 bz2lib/1.0.8 liblz4/1.9.4 libzstd/1.5.6
Operating System
Windows 11
Other Operating System
No response
uname -a
No response
Compiler
No response
CPU
No response
Virtualization / Containers
No response
Other Information
Tesseract is recognising '±' as '+'. In some places, it doesn't even recognise this.
Python 3.12