Added a new, better image_to_string_v2 method using CLIP zero shot classification on top of Tesseract to better distinguish between images of different languages before they are parsed by OCR.
A demonstration of CLIP's ability to recognize words from different languages is shown below. It can be seen to assign the language for every bounding box on the below image correctly:
Added a new, better image_to_string_v2 method using CLIP zero shot classification on top of Tesseract to better distinguish between images of different languages before they are parsed by OCR.
A demonstration of CLIP's ability to recognize words from different languages is shown below. It can be seen to assign the language for every bounding box on the below image correctly: